PEER  REVIEW,  CITATIONS,  AND 
BIOMEDICAL  RESEARCH  POLICY: 

NIH  GRANTS  TO  MEDICAL  SCHOOL  FACULTY 

PREPARED  FOR  THE  HEALTH  RESOURCES  ADMINISTRATION  AND  THE 
OFFICE  OF  THE  ASSISTANT  SECRETARY  FOR  PLANNING  AND  EVALUATION 
OF  THE  DEPARTMENT  OF  HEALTH,  EDUCATION,  AND  WELFARE 


GRACE  M.  CARTER  R-1583-HEW  DECEMBER  1974 


The  research  described  in  this  report  was  sponsored  by  the  Health  Resources 
Administration  and  the  Office  of  the  Assistant  Secretary  for  Planning  and 
Evaluation,  Department  of  Health,  Education,  and  Welfare,  under  Contract  No. 
NOl-MB-24196  (formerly  NIH  72-4196).  Reports  of  The  Rand  Corporation  do 
not  necessarily  reflect  the  opinions  or  policies  of  the  sponsors  of  Rand  research. 


Published  by  The  Rand  Corporation 


DEPARTMENT  OF  HEALTH,  EDUCATION,  AND  WELFARE 
PUBLIC  HEALTH  SERVICE 

NATIONAL  INSTITUTES  OF  HEALTH 


DATE:  s',  1975 


FROM  :  Special  Assistant  to  the  Director,  DPA,  OADPPE,  NIH 


SUBJECT  ;  Pand  Report  R-1583:  "Peer  Review,  Citation,  and  Biomedical 
Research  Policy:  KIR  Grants  to  Medical  School  Faculty" 
dated  December  1974 

.The  Rand  Corporation  recently  completed  a  two  year  study  of  the 
effects  of  Federal  programs  on  academic  health  centers.  It  was 
jointly  sponsored  by  DHEW/OS  and  the  then  BHME.  As  a  part  of  this 
study  an  effort  was  made  to  analyze  the  NIH  peer  review  process  to 
relate  prior  assessment  of  proposed  research  merit  to  the  subsequent 
value  of  the  research  in  terms  of  its  "usefulness"  to  other  scientists; 
this  was  done  within  the  broad  context  of  developing  descriptions  of 
decision  making  processes  that  relate  Federal  policy  measures  and 
desired  public  policy  outcomes.  Rand  Report  R-15S3  presents  the  results 
of  t’nat  effort. 

The  subject  study  develops  an  original  and  ingenious  statistical 
methodology  to  examine  several  significant,  perennial  policy  issues 
concerning  the  effectiveness  of  NIH  management  procedures  and  opera¬ 
tions,  Objective  measures  of  research  output  and  quality  are 
i.dentified  based  on  data  concerning  NIH  grants  and  grant  applications, 
and  the  publications  generated  from  grant  supported  research  work. 

These  research  output  measures  are  then  used  to  explore  possible  trends 
in  the  quality  of  grant  supported  research,  the  operation  of  the  Nlrl 
peer  review  system,  and  the  effect  of  budgetary  restrictions  and  chang¬ 
ing  priorities  on  research  quality  and  output. 

The  Rand  Corporation  v;ill  continue  its  work  on  research  output 
measures,  peer  review,  and  related  policy  issues  under  a  nev; 
eighteen  month  contract  with  the  NIH,  "Analysis  of  NIH  Policies  for 
the  Support  of  Extramural  Research." 


'  Steven  E.  Bol.lt,  Sc.D. 


-2- 


Distribution 

OD  Staff 
B/I/D  Directors 

B/I/D  Planning  &  Evaluation  Officers 
ECEA  Members 

PP/CG  Task  Force  Members 

Dr .  Seymour  Perry 

Dr,  Ann  Kaufman 

Dr,  Charles  Lov^e 

Dr.  Nathaniel  Berliia 

Dr.  Joan  Turik 

Dr.  Zora  Griffo 

Dr.  William  Gay 

Dr.  Alfred  Webb 

Dr.  Gerard  Green 

Ms.  Jane  Knapp 

Dr.  Thomas  Chase 

Ms.  Claudia  Cox 

Dr.  Robert  Trumble 

Dr.  William  Pollin 


PEER  REVIEW,  CITATIONS,  AND 
BIOMEDICAL  RESEARCH  POLICY: 

NIH  GRANTS  TO  MEDICAL  SCHOOL  FACULTY 


PREPARED  FOR  THE  HEALTH  RESOURCES  ADMINISTRATION  AND  THE 
OFFICE  OF  THE  ASSISTANT  SECRETARY  FOR  PLANNING  AND  EVALUATION 
OF  THE  DEPARTMENT  OF  HEALTH,  EDUCATION,  AND  WELFARE 


GRACE  M.  CARTER 


R-1583-HEW 
DECEMBER  1974 


Rand 

SANTA  MONICA,  CA.  90406 


PREFACE 


The  research  in  this  report  was  performed  under  contract  with  the  Bureau  of 
Health  Resources  Development  of  the  Health  Resources  Administration  and  the 
Office  of  the  Assistant  Secretary  for  Planning  and  Evaluation,  Department  of 
Health,  Education,  and  Welfare.  It  is  part  of  a  larger  study  of  the  effects  of  federal 
programs  on  academic  health  centers. 

The  federal  effects  on  the  centers  stem  from  the  activities  of  a  dozen  federal 
agencies  administering  well  over  100  distinct  programs.  The  focus  of  the  study  is  on 
understanding  the  effects  of  these  programs  on  the  operations  of  the  centers  and  on 
the  composition  and  mix  of  their  outputs  of  education,  research,  and  patient  care. 
This  report  deals  with  aspects  of  the  research  related  to  the  administration  of 
research  grants  by  the  National  Institutes  of  Health. 

This  report  was  also  submitted  as  partial  fulfillment  of  the  requirements  for  a 
Ph.D.  degree  in  policy  analysis  from  the  Rand  Graduate  Institute. 


< 


1 


I  I 
t's 

■S' 


;  :■<  /m'fi.r'' 


„!’■■• ... 


.  ..  ,A:,l  3«S5-.t 


t  w. 

'  J  r 

i !  '(’i'  ritik'irffi,.  . ". 

_ * 

i4i 

nili  'llii 

'.V  ■■'  i 

V(r' 

.3^'  _ 

fi'T'  ^  'J/ 

■'•a 

yf>  1  ,.  ■. 

- 

'V' 

;'t(v;vilrti4^uvr  '  ■.'■j  ^ 

■■'  1  ^tiii  r'i 
•V  Jlfe' 

■  -iV,  :  .< 

.1 

■  ,  •  ?,  Ji-  n'l^y 

;  '1  •. 

i  ::v 

)'t 

■ .  1 '  -  c '  'l  1  h;»  •'  •  1  ■  <■ '  ■  W  ^  t 

'■‘'  J 

}>■, 

V  ’  r  '■■■!'.,■  1 

"f  -  vi;.  ■'  1 

‘-'f  ' '.‘<1 ;  '  .i'  ''i^V 

I  ^  'I'}*'./' I  ■'■  ■  :  I'i'li- 


SUMMARY 


To  evaluate  the  effect  of  policies  governing  NIH  research,  it  is  necessary  to 
develop  measures  that  characterize  the  output  of  funded  research  projects.  One  such 
measure  is  the  judgment  of  a  study  section  reviewing  an  application  to  renew  the 
grant.  This  judgment  can  be  assumed  to  be  highly  correlated  with  the  study  section’s 
judgment  of  the  scientific  merit  of  the  research  performed  during  the  preceding 
grant  period.  Since  the  peer  review  process  is  itself  one  of  the  policies  of  NIH  to  be 
examined,  it  is  desirable  to  have  an  output  measure  independent  of  this  process.  In 
this  report  counts  of  citations  of  articles  produced  under  NIH  grants  are  used  as  a 
surrogate  for  the  usefulness  of  the  sponsored  research  to  other  researchers.  The 
scientific  merit  of  the  proposal  to  renew  the  grant  and  citation  data  are  crude 
measures  only  of  the  value  of  the  research  to  science.  The  progress  of  science  is  only 
an  intermediate  step  in  the  attainment  of  the  long  range  goal  of  NIH — improvement 
of  health  care.  Thus,  the  basic  assumption  of  this  report  is  that  in  the  long  run 
science  will  be  an  effective  instrument  in  the  alleviation  of  disease. 

The  peer  review  panels  of  the  National  Institutes  of  Health  play  a  very  impor¬ 
tant  role  in  the  selection  of  the  applications  for  research  grants  that  will  be  funded. 
This  report  examines  the  judgments  of  applications  for  research  project  grants  from 
medical  schools  between  1968  and  1973  and  analyzes  the  internal  consistency  of 
judgments  by  the  peer  review  panels  and  changes  in  these  judgments  over  time. 

The  proportion  of  competing  applications  that  could  be  funded  has  declined 
during  this  period.  If  the  quality  of  applications  remained  unchanged  or  improved 
and  if  the  dual  review  process  resulted  in  ranking  applications  in  the  order  of 
expected  scientific  merit,  then  the  average  quality  of  funded  applications  would  have 
improved.  The  declining  rate  at  which  renewal  applications  are  recommended  for 
disapproval  and  the  improving  priority  score  received  on  approved  renewal  appli¬ 
cations  reflect  an  improvement  in  the  average  quality  of  the  grants  appearing  for 
renewal  each  year  between  1968  and  1973.  This  means  that  the  later  study  sections, 
though  composed  for  the  most  part  of  different  people,  verify  the  earlier  study 
sections’  selection  of  the  set  of  grants  that  were  awarded  good  enough  priority  scores 
to  fund.  It  is  an  indication  that  the  concept  of  "scientific  merit’’  contains  enough 
objective  content  that  different  groups  of  people  meeting  several  years  apart  will 
agree  that  one  set  of  grants  is  more  scientifically  meritorious  than  another  set  of 
grants. 

The  correlation  coefficient  between  the  priority  scores  received  on  successive 
occasions  by  the  same  grant  is  approximately  0.4.  This  low  value  testifies  to  both  the 
uncertain  nature  of  scientific  enterprises  and  the  willingness  of  study  sections  to 
make  critical  examinations  of  applications  from  even  well-established  investigators. 

The  rate  at  which  applications  for  renewal  are  disapproved  appears  to  have 
declined  between  1968  and  1973,  but  the  trend  in  the  priority  scores  received  on 
approved  renewal  applications  can  be  explained  entirely  by  the  improved  quality  of 
grants.  The  judgment  of  applications  for  new  projects  reflected  a  lower  rate  of 
disapproval  and  a  worse  average  priority  score  in  FY  1973  than  in  FY  1971-1972. 

Judgments  of  new  applications  are  statistically  related  to  judgments  of  earlier 
applications  from  the  same  individual.  However,  the  relationship  is  weak  enough  to 


V 


VI 


allow  the  largest  part  of  the  variance  in  the  priority  score  to  be  explained  by  the 
merit  of  the  project.  The  judgment  of  a  renewal  application  is  also  related  to  the 
judgments  of  the  study  sections  on  applications  for  other  projects,  but  it  is  more 
strongly  related  to  the  priority  score  received  on  the  previous  competing  application 
for  the  same  grant. 

The  output  of  research  as  measured  by  citation  data  was  examined  by  selecting 
a  sample  of  747  research  project  grants  and  all  51  program  project  grants  awarded 
to  medical  school  faculty  on  a  competitive  basis  in  fiscal  year  1967.  A  list  of  approxi¬ 
mately  5800  publications  produced  from  1966  to  1970  under  these  grants  was  re¬ 
trieved  from  the  Research  Grants  Index.  The  Institute  for  Scientific  Information 
furnished  a  record  of  all  41,000  citations  of  these  publications  that  occurred  in 
1968-1972  in  journals  covered  by  their  Science  Citation  Index. 

The  first  citation  measure  examined  is  the  production  of  at  least  one  very 
frequently  cited  article.  Of  the  grants  in  the  sample,  14.5  percent  each  produced  at 
least  one  of  the  most-cited  5  percent  of  the  journal  articles  in  the  sample.  The 
average  priority  score  received  on  applications  to  renew  grants  that  produced  at 
least  one  most-cited  article  was  47  points  better  than  would  be  predicted  from  the 
scores  these  grants  received  in  1967.  This  may  be  seen  as  evidence  that  citations  are 
a  measure  of  research  quality  as  evaluated  by  a  panel  of  scientific  peers.  However, 
if  one  assumes  that  citations  are  related  to  research  quality  then  the  result  is 
evidence  that  the  evaluation  of  renewal  applications  is  strongly  affected  by  the 
research  results  of  the  preceding  grant  period. 

The  set  of  grants  that  ultimately  produced  at  least  one  of  the  most-cited  articles 
was  actually  perceived  by  the  peer  review  process  in  1967,  before  the  grants  were 
funded,  as  likely  to  be  more  useful;  and  this  set  was  awarded  a  better  average 
priority  score  than  the  other  grants  in  the  sample.  These  grants  also  received  a 
larger  average  dollar  award  and  a  commitment  of  support  for  a  longer  time  period 
on  the  average  than  the  other  grants  in  the  sample. 

The  publications  in  tbe  sample  are  from  the  years  1966-1970,  and  citations  were 
retrieved  for  the  years  1968-1972.  To  adjust  the  number  of  citations  retrieved  to 
account  for  the  year  of  publication,  a  model  is  constructed  that  estimates  the  num¬ 
ber  of  citations  that  occur  in  year  i  after  publication  for  each  i  in  [0,  6]  for  which 
data  are  missing.  A  theoretical  estimate  of  the  standard  error  in  a  prediction  of  T, 
the  total  number  of  citations  of  an  article  that  have  occurred  (or  will  occur)  in  years 
0  through  6  following  publication,  is  derived  as  a  function  of  the  year  of  publication. 
More  than  95  percent  of  the  variance  in  T  can  be  explained  by  the  available  data 
and  model  for  publications  in  1966,  1967,  and  1968.  For  the  articles  published  in 
1968  citations  in  only  years  0-4  are  available,  but  from  these  data  one  can  predict 
the  citations  that  will  occur  in  years  5  and  6  with  only  a  small  expected  error. 
Therefore,  if  citation  data  for  six  years  following  publication  is  enough  to  measure 
the  short  term  use  of  research  by  other  researchers,  then  data  on  years  0-4  will  do 
almost  as  well.  The  loss  of  citation  data  for  the  fourth  year  following  publication  for 
the  1969  articles  results  in  a  reduction  of  the  percentage  of  the  variance  in  T  that 
can  be  explained  to  0.72.  However,  the  citation  data  available  for  the  1970  publica¬ 
tions  contain  so  little  information  about  T  that  it  is  necessary  to  remove  these 
publications  from  the  remaining  analysis,  which  is  based  on  the  estimated  T  for  each 
publication. 

On  the  basis  of  average  citation  rates  to  journal  articles,  each  grant  was  assigned 


to  one  of  three  categories  depending  on  the  principal  investigator’s  department  in 
the  medical  school.  Grants  from  most  of  the  basic  science  departments  (biochemis¬ 
try,  biophysics,  microbiology,  and  physiology)  were  placed  in  one  category.  Grants 
from  the  departments  of  medicine,  obstetrics  and  gynecology,  pathology,  and  pedia¬ 
trics  were  placed  in  another  category.  The  remaining  group  consisted  of  grants  from 
those  departments  with  significantly  lower  than  average  citation  rates:  anatomy, 
surgery,  psychiatry,  and  the  remaining  smaller  clinical  departments. 

Many  output  measures  could  be  constructed  from  the  data  on  publications  and 
citations.  To  examine  questions  of  policy  interest  it  was  assumed  that  the  judgment 
of  the  study  sections  on  renewal  applications  represents  an  assessment  of  the  quality 
of  the  research  performed  during  the  preceding  grant  period.  Then  for  a  subset  of 
the  grants  for  which  a  renewal  is  available,  the  score  received  on  the  renewal 
application  was  regressed  on  possible  output  measures  so  that  one  with  a  high 
explanatory  power  could  be  chosen.  This  analysis  was  performed  separately  for 
grants  in  each  of  the  three  categories. 

No  output  measure  was  significantly  correlated  with  the  second  priority  score 
for  the  group  of  grants  from  anatomy,  surgery,  and  the  smaller  clinical  departments. 
This  may  be  for  any  one  of  a  number  of  causes:  The  sample  size  is  small,  the 
scientific  fields  in  the  group  are  diverse,  research  quality  may  not  be  expressed  in 
either  publications  or  citations,  or  the  second  priority  score  may  not  accurately 
reflect  the  quality  of  the  preceding  research.  A  means  of  choosing  among  these 
possibilities  is  not  readily  apparent. 

In  the  basic  science  and  medical  groups  the  strength  of  the  relationship  between 
the  second  score  and  the  output  measures  is  not  very  sensitive  to  the  exact  choice 
of  output  measures,  although  average  citation  rates  performed  better  than  total 
citations;  and  citations  of  journal  articles  appear  to  be  more  important  than  cita¬ 
tions  of  other  publications.  In  none  of  the  cases  that  included  a  citation  output 
measure  was  there  any  remaining  relationship  between  second  score  and  any  publi¬ 
cation  count.  Therefore,  numbers  of  publications  do  not  appear  to  be  an  additional 
measure  of  research  quality  after  citations  have  been  included.  The  variable  that 
has  been  chosen  from  these  regressions  to  represent  research  quality  is  the  average 
number  of  citations  of  all  publications  that  were  cited  at  least  twice  in  the  six  years 
following  publication. 

The  citation  data  are  shown  to  be  related  to  the  priority  score  awarded  in  1967. 
This  relationship  is  additional  evidence  that  the  concept  of  scientific  merit  is  not 
entirely  subjective. 

The  use  of  the  renewal  priority  score  and  the  citation  data  leads  to  the  same 
conclusion  in  another  area  also.  The  correlation  coefficient  between  the  score  re¬ 
ceived  on  a  renewal  application  and  the  score  received  the  previous  time  the  grant 
was  reviewed  and  funded  is  very  low.  The  citation  measure  is  more  strongly  related 
to  the  score  received  on  subsequent  renewal  applications  than  to  the  score  received 
in  1967  before  the  research  was  performed.  It  seems  clear  that  the  peer  review 
system  is  very  flexible  about  adapting  its  judgment  of  applications  to  reflect  changes 
in  merit.  These  observations  are  consistent  with  the  existence  of  a  great  deal  of 
uncertainty  in  preliminary  judgments  of  the  scientific  merit  of  proposed  research 
and  an  adaptation  by  the  second  study  section  to  the  actual  merits  of  the  research 
and  away  from  the  earlier  appraisal  of  its  potential. 

Program  project  grants  tend  to  produce  fewer  publications  per  dollar  than  re- 


Vlll 


search  project  grants,  but  the  average  quality  of  the  research  performed  under  these 
grants  appears  to  be  slightly  higher  as  measured  by  citations.  Most  of  the  difference 
in  citations  is  explained  by  the  very  small  proportion  of  program  project  grants  that 
produce  articles  cited  fewer  than  an  average  of  ten  times.  Because  of  the  size  of  the 
award,  program  projects  may  be  awarded  only  to  investigators  with  exceptional 
credentials  or  only  for  research  in  areas  where  the  chance  of  significant  results  is 
high. 

The  policies  that  have  governed  NIH  include  dividing  research  funds  for  work 
related  to  disease  categories  according  to  national  priorities  rather  than  according 
to  the  demand  for  support  in  the  area  by  the  biomedical  research  community.  This 
policy  will  be  successful  only  if  good  researchers  can  be  attracted  to  an  area  by  the 
availability  of  funds.  The  data  indicate  that  the  availability  of  funds  does  increase 
the  number  of  applications  in  an  area.  The  judgments  of  the  study  sections  of 
applications  from  principal  investigators  whose  applications  have  been  assigned  to 
more  than  one  Institute  of  NIH  have  been  compared  with  judgments  of  investigators 
whose  work  has  always  been  assigned  to  a  single  Institute.  Using  these  judgments 
as  indicators  of  the  quality  of  the  researchers,  it  appears  that  the  investigators 
whose  applications  are  assigned  to  an  Institute  for  the  first  time,  but  who  have  had 
other  applications  assigned  to  other  Institutes,  are  slightly  superior  as  a  group  to 
those  who  repeatedly  apply  for  support  in  areas  of  interest  to  the  same  Institute. 

The  effect  of  the  level  of  support  for  research  project  grants  on  biomedical 
research  output  is  the  final  policy  question  addressed  in  this  report.  This  level  of 
support  has  been  declining  in  recent  years.  If  the  1967  applications  had  been  funded 
only  until  the  average  priority  score  was  the  same  as  the  average  priority  score  on 
the  grants  funded  in  FY 1973,  then  only  40  percent  of  the  FY  1967  grants  would  have 
been  funded. 

The  ranking  of  applications  in  order  of  decreasing  priority  scores  results  in  a 
ranking  according  to  increasing  values  of  both  available  indicators  of  usefulness  of 
research:  the  priority  score  assigned  to  a  subsequent  renewal  application  and  cita¬ 
tions.  Since  within  the  programs  of  an  Institute  the  funding  of  grants  is  approxi¬ 
mately  in  priority  score  order,  increasing  increments  of  funding  result  in  the  fund¬ 
ing  of  research  grants  of  lower  expected  quality.  If  one  assumes  that  each  research 
project  represents  a  separate  unit  of  production,  then  it  would  follow  that  there  are 
decreasing  marginal  returns  to  increasing  expenditures.  The  average  value  of  the 
citation  measure  for  the  1967  grants  that  would  have  been  funded  at  the  1973 
funding  level  is  17.0,  and  the  average  value  for  the  grants  that  would  not  have  been 
funded  was  only  11.2. 

The  uncertainty  inherent  in  predictions  of  research  output  causes  a  great  deal 
of  variance  in  the  output  of  grants  at  each  priority  score.  Another  way  to  look  at 
the  research  lost  as  a  function  of  the  level  of  expenditure  is  to  concentrate  on  the 
10  percent  of  grants  with  the  highest  citation  rate.  At  each  of  the  levels  of  funding 
investigated,  this  set  of  grants  is  more  heavily  represented  among  those  that  would 
be  funded  than  among  those  that  would  not,  but  the  probability  that  a  grant  at  the 
margin  for  payment  would  be  in  this  set  was  substantial.  At  the  level  approximately 
equal  to  the  1973  pay  level  the  probability  that  a  grant  at  the  margin  for  payment 
would  be  in  this  set  exceeds  the  average  for  all  funded  1967  grants. 


ACKNOWLEDGMENTS 


The  study  benefitted  from  the  guidance  of  Dr.  Carl  D.  Douglass,  Deputy  Director 
of  the  Division  of  Research  Grants  of  the  National  Institutes  of  Health.  He  also 
discovered  some  inaccuracies  in  an  earlier  draft  of  this  report  and  noted  places 
where  clarification  was  needed.  Rand  colleagues  John  E.  Koehler  and  Albert  P. 
Williams  provided  suggestions  and  constructive  criticism  continually  during  the 
course  of  the  study.  Solomon  Eskenazi,  Chief  of  the  Statistics  and  Analysis  Branch 
of  the  Division  of  Research  Grants,  has  been  generous  with  his  own  time  and  the 
time  of  his  staff  in  furnishing  the  IMPAC  data  and  an  understanding  of  its  contents. 
Dr.  Franzi  J.  Ingelfinger  pointed  out  his  use  of  citations  in  assessing  the  utilization 
of  articles  in  the  New  England  Journal  of  Medicine. 

Thanks  are  also  due  to  Dr.  Steven  Bollt  of  the  Division  of  Program  Analysis  of 
the  National  Institutes  of  Health,  Edward  Ignall  of  Columbia  University  and  the 
New  York  City-Rand  Institute,  as  well  as  Arthur  Alexander,  Joseph  Newhouse, 
Tom  Rockwell,  and  Norman  Shapiro,  all  of  Rand,  for  many  useful  suggestions  that 
improved  this  report.  Any  remaining  errors  are  solely  the  responsibility  of  the 
author. 


IX 


C.  i' 


V  ■ 


4r. 


t- 


■i'"’ 


■'■<'  ■  .>i  .’. 


■// 


CONTENTS 


PREFACE .  iii 

SUMMARY .  V 

ACKNOWLEDGMENTS .  ix 

Section 

I.  INTRODUCTION .  1 

Data  Sources .  4 

Content  of  the  Report .  4 

II.  CHANGES  IN  THE  JUDGMENTS  OF  INITIAL  REVIEW  GROUPS 

OVER  TIME .  6 

Disapproval  Rate  for  Renewal  Applications .  10 

Priority  Score  Received  on  Renewal  Applications .  13 

Difference  Between  Judgments  of  New  and  Renewal 

Applications .  15 

Judgment  of  New  Applications .  15 

Relation  Between  Renewal  Judgment  and  Judgments  of  Other 

Applications .  16 

Summary .  18 

III.  METHODOLOGY  FOR  SAMPLING  GRANTS  AND  OBTAINING 

CITATION  DATA .  20 

IV.  MOST-CITED  ARTICLES . : .  24 

Most-Cited  Articles  and  Judgments  of  Subsequent  Renewal 

Applications .  25 

Characteristics  of  the  Most-Cited  Grants .  29 

V.  A  CITATION  MEASURE  OF  RESEARCH  OUTPUT .  38 

Time  Pattern  of  Citations .  38 

Size  of  Scientific  Field .  40 

A  Measure  of  Research  Output .  40 

A  Model  of  Research  Output .  46 

Characteristics  of  1967  Decision .  48 

VI.  POLICY  QUESTIONS .  51 

The  Peer  Review  Process .  51 

Research  Project  and  Program  Project  Grants .  54 

Effect  of  Level  of  Funding  on  Applications  to  Institutes .  56 

Effect  of  Level  of  Funding  on  Output  Measures .  59 

Possibilities  for  Future  Research .  62 

Appendix 

A.  TIME  PATTERN  OF  CITATIONS .  65 

B.  THE  GROUPING  OF  GRANTS  BY  SCIENTIFIC  FIELD .  74 


XI 


xii 

C.  MAJOR  NIH  POLICY  ISSUES .  76 

D.  ON  CITATIONS  FOR  NEGATIVE  REASONS .  85 

E.  SUMMARY  OF  LINEAR  RELATIONSHIPS .  87 

REFERENCES .  89 


1.  INTRODUCTION 


This  report  develops  descriptions  of  the  output  of  biomedical  research  projects 
that  could  be  used  to  examine  the  effects  on  research  output  of  some  of  the  policies 
that  govern  the  management  of  the  extramural  research  programs  of  the  National 
Institutes  of  Health.  The  major  part  of  the  study  was  devoted  to  developing  and 
validating  measures  of  research  output.  As  part  of  this  process,  the  study  describes 
the  results  of  past  management  policies  in  terms  of  the  output  measures  and  exam¬ 
ines  some  policy  questions. 

The  research  was  undertaken  as  part  of  a  broader  study  ^  that  examines  the 
effects  of  federal  policies  on  academic  health  centers  defined  as  the  organizational 
entities  that  provide  education  to  physicians.  That  broader  study  is  concerned  with 
the  entire  range  of  federal  programs  and  their  effects  on  the  education,  patient  care, 
and  research  outputs  of  academic  health  centers.  Because  of  the  focus  of  the  broader 
study,  the  analysis  contained  in  this  report  is  restricted  to  research  performed  in  an 
academic  health  center  (that  is,  in  a  medical  school,  in  a  hospital  with  a  major 
affiliation  with  a  medical  school,  or  in  one  of  the  corporations  set  up  by  some  medical 
schools  for  the  management  of  research). 

Funds  for  research  in  academic  health  centers  come  from  many  federal  agencies 
— including  the  National  Science  Foundation,  Atomic  Energy  Commission,  Depart¬ 
ment  of  Defense,  Veterans  Administration,  and  Health  Resources  Administration — 
but  over  three-fourths  of  the  federal  support  for  research  activities  in  medical 
schools  comes  from  NIH.  For  that  reason,  this  study  concentrates  on  NIH  policy 
rather  than  on  policies  governing  other  federal  agencies  with  respect  to  research  in 
academic  health  centers. 

The  central  problem  in  this  study  is  the  measurement  of  research  output.  Al¬ 
though  the  ultimate  goal  of  all  the  research  sponsored  by  the  NIH  is  an  improve¬ 
ment  in  health  care,  the  link  between  basic  research  and  changes  in  health  care  is 
very  difficult  to  trace.  In  addition,  the  length  of  time  that  elapses  between  basic 
research  and  changes  in  health  care  is  often  too  long  for  such  improvement  to  be 
a  useful  criterion  for  evaluating  basic  research  management  policies. 

Much  research  is  used  directly  only  as  an  input  to  further  research,  suggesting 
that  it  might  be  measured  by  its  usefulness  to  other  researchers.  Such  a  measure 
of  research  output  is  based  on  the  belief  that  in  the  long  run  science  will  be  an 
effective  instrument  for  alleviating  disease.  The  value  of  such  a  measure  of  research 
output  is  that  it  can  be  assessed  in  much  shorter  periods  than  are  required  to 
measure  the  health  care  effects  of  basic  research. 

One  way  to  evaluate  the  usefulness  of  a  particular  piece  of  research  to  other 
researchers  is  to  use  citations  to  articles  describing  the  work;  that  is  the  major  device 
used  here.  The  peer  review  process  used  by  NIH  in  selecting  projects  for  possible 
support  provides  other  measures  of  research  output.  I  shall  describe  that  system 
first. 

All  grants  awarded  by  NIH  receive  a  double  review,  first  for  "scientific  merit” 
by  an  Initial  Review  Group  (IRG).  These  groups  are  also  commonly  called  Study 

'  See  Carter  et  al.,  1974,  pp.  1-4,  for  a  description  of  the  entire  study. 


1 


2 


Sections  and  consist  of  12  to  15  scientists  who  are  experts  in  the  field  of  the  appli¬ 
cation.  This  group  votes  either  to  recommend  approval  or  to  recommend  disapproval 
of  the  application.  For  applications  recommended  for  approval,  the  IRG  assigns  a 
priority  score  by  averaging  the  priority  scores  assigned  by  individual  members  of  the 
group.  The  applications  are  then  reviewed  for  programmatic  relevance  and  policy 
considerations  by  the  National  Advisory  Council  of  the  Institute  to  which  the  appli¬ 
cation  is  assigned.  These  councils  include  both  lay  and  scientific  people;  Institute 
officials  are  prevented  by  law  from  funding  grants  not  recommended  for  approval 
by  the  councils.  Although  the  Advisory  Councils  influence  decisions  about  the  fund¬ 
ing  levels  of  programs  within  the  Institute,  they  only  rarely  disagree  with  the 
scientific  merit  priority  scores  assigned  by  the  IRG  to  particular  grants.  The  admin¬ 
istrators  within  NIH  are  not  bound  by  law  to  follow  the  priority  order  of  those 
applications  recommended  for  approval  by  Advisory  Councils,  but  in  practice  most 
of  them  do  so.  Thus,  the  IRG  plays  a  very  large  role  in  determining  which  grants 
will  be  funded. 

The  priority  score  is  a  peer  rating  of  the  merits  of  each  research  proposal  and 
a  preliminary  evaluation  of  the  research  itself  For  many  grant  applications,  some 
work  is  done  before  the  funding  of  the  application  that  indicates  the  kinds  of  results 
that  will  be  obtained.  If  the  study  section  is  aware  of  the  work  that  has  been  done, 
the  priority  score  is  not  just  an  evaluation  of  the  research  proposal,  but  of  the 
research  itself 

The  original  award  of  an  NIH  grant  is  typically  for  three  years.  For  many 
awarded  grants  an  application  to  renew  the  grant  for  an  additional  period  of  time 
is  made  later  and  reviewed  by  an  IRG  and  a  National  Advisory  Council  in  the  same 
manner  as  a  new  application.  By  the  time  the  renewal  application  is  prepared,  the 
researcher  has  had  about  two  years  to  perform  his  work,  and  the  results  of  his  efforts 
should  be  available  to  the  IRG.  The  judgment  rendered  by  the  IRG  on  the  renewal 
application  is,  of  course,  a  judgment  of  the  proposal  for  additional  research,  but  one 
would  expect  that  the  IRG’s  judgment  of  the  scientific  merit  of  the  proposal  for 
additional  research  would  be  highly  correlated  with  the  judgment  that  would  have 
been  made  by  the  same  IRG  if  it  had  been  asked  to  judge  the  scientific  merit  of  the 
research  performed  during  the  preceding  period.  Because  of  the  additional  informa¬ 
tion  available,  the  peer  rating  of  the  renewal  application  should  be  a  more  accurate 
guide  to  the  merits  of  the  original  application  than  the  first  peer  rating  and  thus 
is  a  more  satisfactory  measure  of  research  output. 

Descriptions  of  research  output  are  derived  from  citation  data  and  the  judgment 
of  the  peer  review  panels  and  then  assumed  to  be  ordinally  related  to  the  quality 
of  research.  The  first  policy  question  addressed  is:  How  well  does  the  peer  review 
process  function  as  a  device  for  selecting  the  grants  that  should  receive  support? 
This  peer  review  system  is  based  on  the  view  that  science  is  so  complex  that  only 
working  scientists  have  enough  detailed  knowledge  to  judge  the  scientific  merit  of 
particular  grant  applications.  The  Woolridge  Committee,^  appointed  by  President 
Johnson  in  1964  to  review  NIH-supported  research,  found  that  the  peer  review 
system  was  responsible  for  the  high  quality  of  NIH-supported  research.  Dr.  Charles 
C.  Edwards,  Assistant  Secretary  for  Health  of  DHEW,  said  that  the  peer  review 
system  is 


The  White  House,  1965. 


3 


the  best  system  anyone  knows  for  allocating  a  limited  amount  of  funds  to 
support  biomedical  research  in  this  country.  Indeed,  the  NIH  peer  review 
system  is  one  of  the  most  remarkable  accomplishments  in  the  history  of 
science  administration.^ 

However,  this  system  has  also  been  criticized.  These  criticisms  are  summarized  in 
a  recent  Office  of  Management  and  Budget  (0MB)  document,^  which  asserts  that  the 
scientific  merit  judgments  are  not  subject  to  objective  assessment.  The  0MB  docu¬ 
ment  also  calls  attention  to  a  possible  conflict  of  interest,  because  members  of  an  IRG 
determine  the  allocation  of  research  funds  of  which  either  they  or  their  institutions 
receive  a  part.  Evidence  supporting  this  conflict  of  interest  charge  include  the  con¬ 
centration  of  NIH  funds  in  a  few  select  institutions  and  that  the  system  "produces 
a  large  number  of  approved,  but  unfunded  applications  that  are  often  used  to  sup¬ 
port  the  need  for  more  research  funds”®  of  which  the  same  scientific  community  will 
be  the  benefactor.  But  this  same  evidence  may  also  be  seen  as  showing  that  the  best 
research  is  carried  out  in  a  few  select  institutions,  and  that  NIH  funding  levels  have 
not  been  adequate  to  support  all  scientifically  meritorious  proposals. 

Another  policy  question  addressed  in  this  report  is  the  effect  of  different  levels 
of  funding  on  biomedical  research  output.  As  has  been  widely  discussed,  the  rapid 
growth  in  the  funds  available  to  NIH  characteristic  of  the  period  between  the  end 
of  World  War  II  and  1966-1967  has  ended.  The  period  between  1967  and  1973  has 
seen  only  moderate  growth  in  the  total  dollars  allocated  and  some  years  of  actual 
decline.  The  end  to  significant  real  growth  in  research  funds  was  accompanied  by 
a  shift  to  more  support  of  targeted  research.  These  effects  have  combined  to  result 
in  a  decline  in  the  number  of  traditional  research  project  grants  supported  in 
medical  schools.  At  the  same  time  there  has  been  a  significant  expansion  in  medical 
school  faculty,  because  of  both  the  needs  of  the  expanded  enrollment  mandated  by 
capitation  and  the  opening  of  many  new  medical  schools.  Increased  competition  and 
scarcer  funds  resulted  in  a  drop  in  the  fraction  of  applications  for  traditional  re¬ 
search  projects  that  could  be  funded  from  0.43  in  1968  to  0.26  in  1973.  The  effects 
of  this  decline  on  the  available  measures  of  research  output  is  explored. 

The  policies  that  have  governed  NIH  include  dividing  research  funds  for  work 
related  to  disease  categories  according  to  national  priorities  rather  than  according 
to  the  demand  for  support  in  the  area  by  the  biomedical  research  community.  This 
policy  will  be  successful  only  if  good  researchers  can  be  attracted  to  an  area  by  the 
availability  of  funds.  I  have  therefore  examined  the  quality  (as  measured  by  the  peer 
review  panels)  of  researchers  who  have  requested  support  in  areas  of  interest  to 
more  than  one  of  the  Institutes  of  NIH. 

The  work  presented  in  this  report  concerns  only  two  of  the  many  funding  instru¬ 
ments  used  by  NIH  to  support  research:  traditional  research  project  grants  and 
program  project  grants.  Both  of  these  mechanisms  support  basic  research  initiated 
by  the  investigator.  The  difference  between  them  is  primarily  one  of  scale.  The 
research  project  grants  are  usually  smaller  and  support  a  discrete,  circumscribed 
project,  in  most  cases  by  a  single  investigator.  Program  project  grants  are  for  the 
support  of  a  broadly  based  and  usually  long  term  program  of  research  activity 


^  Edwards,  1974. 
■*  0MB,  1973. 

^  Ibid.,  p.  7. 


4 


"directed  toward  a  range  of  problems  within  a  broad  category,  having  a  central 
research  focus,  rather  than  a  specific  single  purpose.”®  Although  I  have  considered 
only  these  two  funding  mechanisms,  some  of  the  methodology  developed  in  the 
course  of  this  work  could  be  used  to  examine  research  produced  through  other 
mechanisms. 


DATA  SOURCES 

The  data  used  in  this  study  come  from  the  following  sources: 

•  The  Information  for  Management  Planning  Analysis  and  Coordination 
(IMP AC)  file  created  and  maintained  by  the  Division  of  Research  Grants 
of  the  NIH.  The  subset  of  the  file  used  in  this  work  contains  all  applications 
for  research  grant  support  (both  funded  and  unfunded)  from  medical  school 
faculty  in  FY  1968  through  1973,  and  funded  applications  for  FY  1967. 

•  The  Research  Grants  Index,  which  annually  lists  publications  produced 
through  grants  supported  by  NIH.  Publications  were  selected  from  this 
index  for  FY  1967  through  1970. 

•  The  Science  Citation  Index  ol the  Institute  for  Scientific  Information.  This 
index  contains  records  on  all  citations  made  in  approximately  365,000 
journal  articles  per  year,  drawn  from  2200  journals,  approximately  60 
percent  of  which  are  in  a  biomedical  field.  The  data  used  in  this  study 
consist  of  all  recorded  citations  to  the  selected  publications  occurring  be¬ 
tween  1968  and  1972. 


CONTENT  OF  THE  REPORT 

The  methodology  used  in  this  report  is  statistical  multivariate  analysis,  which 
tests  hypothesized  relationships.  This  form  of  analysis  has  some  limitations.  The 
foremost  of  these  is  that  statistical  analyses  can  never  prove  causality.  The  most  we 
can  conclude  from  such  analysis  is  that  the  data  either  are  or  are  not  consistent  with 
the  existence  of  the  hypothesized  relationships.  It  is  always  possible  that  other, 
unmeasured  variables  are  the  cause  of  observed  relationships. 

Some  of  the  hypothesized  relationships  are  derived  from  assumptions  that,  if 
true,  would  limit  the  usefulness  of  the  output  measures  for  examining  questions  of 
research  policy.  For  example,  in  Sec.  II,  the  data  on  the  peer  review  judgments  are 
examined  for  relationships  between  judgments  of  a  funded  grant  and  a  subsequent 
renewal  application  and  for  changes  in  the  judgments  of  the  peer  review  groups  over 
time. 

Section  III  describes  the  methodology  used  to  select  a  sample  of  research  grants 
for  which  citation  data  was  retrieved.  In  Sec.  IV,  I  consider  the  simplest  possible 
research  output  measure  that  can  be  derived  from  citations — the  production  of  at 
least  one  very  frequently  cited  article.  There  should  be  a  positive  relationship  be¬ 
tween  measures  of  research  output  derived  from  the  two  kinds  of  data;  research  that 
is  cited  more  frequently  should  tend  to  receive  better  priority  scores.  Such  a  statisti- 


®  Division  of  Research  Grants,  1971,  p.  101. 


5 


cally  significant  relationship  is  found  and  tends  to  bolster  the  face  validity  of  each 
of  the  output  measures. 

In  Sec.  V,  I  examine  some  of  the  problems  encountered  in  using  frequency  counts 
of  citations  as  measures  of  research  output.  For  example,  how  long  must  elapse 
following  publication  before  citation  counts  are  meaningful  measures  of  research 
output?  The  concluding  section  presents  the  implications  of  the  preceding  analysis 
for  the  four  policy  issues  mentioned  above:  the  peer  review  system,  the  shifting 
emphasis  on  different  disease-specific  research  areas,  the  difference  between  re¬ 
search  project  and  program  project  grants,  and  the  effect  of  the  level  of  funding  on 
research  output. 

Appendixes  A  and  B  provide  technical  details  on  the  work  described  in  Sec.  V. 
Appendix  C  is  a  brief  review  of  the  two  major  policy  issues  in  NIH  history:  the  total 
level  of  funding  and  the  support  of  basic  rather  than  targeted  research  efforts.  This 
review  puts  the  current  work  in  perspective  and  describes  some  other  analyses  that 
have  been  or  could  be  performed  to  illuminate  the  effects  of  alternative  methods  of 
supporting  biomedical  research.  It  was  written  in  the  early  part  of  1973.  Appendix 
D  describes  an  attempt  to  find  out  how  frequently  citations  of  articles  are  made  to 
discuss  possible  errors  in  the  cited  article.  In  order  to  facilitate  references  to  data 
presented  in  many  tables  in  the  text,  all  the  linear  relationships  that  are  estimated 
in  Secs.  II  and  V  are  summarized  in  the  two  tables  in  Appendix  E. 


11.  CHANGES  IN  THE  JUDGMENTS  OF  INITIAL 
REVIEW  GROUPS  OVER  TIME 


The  proportion  of  competing  applications  for  research  project  grants’  that  could 
be  supported  by  the  NIH  dropped  between  1968  and  1973.  There  has  been  much  year 
to  year  variation  in  the  number  of  applications  that  can  be  funded  because  of 
variations  in  NIH  appropriations  and  because  varying  amounts  of  funds  are  re¬ 
quired  to  fund  noncompeting  continuation  applications,  which  are  treated  as  a 
moral  commitment.  However,  a  declining  trend  in  the  fraction  of  competing  appli¬ 
cations  that  can  be  funded  is  evident  in  Fig.  1.  This  decline  is  due  both  to  a  rise  in 
the  number  of  applications  for  these  grants  (from  3586  in  1968  to  4191  in  1973),  and 
to  a  decline  in  the  number  of  applications  that  could  be  funded  (from  1535  in  1968 
to  1080  in  1973).  Figure  2  shows  histograms  of  the  total  number  of  applications  and 
funded  applications  for  each  fiscal  year  between  1968  and  1973. 

In  this  section  I  examine  data^  from  the  IMPAC  file  to  determine  whether  the 
Initial  Review  Groups  have  changed  their  evaluations  of  applications  in  response  to 
the  declining  funding  levels.  If  these  changes  are  found  to  be  very  large  it  would 
limit  the  usefulness  of  the  peer  review  group’s  judgments  as  measures  of  research 


Fig.  1 - Fraction  of  competing  research  project  grant  applications 

that  could  be  funded 

‘  Those  with  a  program  code  of  ROl  and  an  application  type  of  1,  2,  or  9  on  the  IMPAC  file. 

^  The  analysis  is  confined  to  the  47  study  sections  of  the  Division  of  Research  Grants  and  to  research 
project  grant  applications  from  medical  schools,  their  research  institutes,  and  hospitals  with  a  major 
affiliation  with  medical  schools.  Deferred  applications  have  been  omitted. 


6 


7 


Fig.  2 - Histogram  of  competing  research  project  grant  applications. 

Funded  applications  shown  in  cross-hatch  (data  on  1967 
non-funded  applications  are  missing) 


output.  In  addition,  some  of  the  data  to  be  presented  here  are  relevant  to  an  evalua¬ 
tion  of  the  integrity  of  the  peer  review  process. 

In  the  first  part  of  this  section  I  concentrate  on  competing  applications  for 
renewal  of  existing  grants  and  determine  the  extent  to  which  changes  in  the  judg¬ 
ments  of  such  applications  can  be  ascribed  to  the  improved  quality  of  these  appli¬ 
cations  rather  than  to  changes  in  funding  levels.  I  then  compare  the  judgment  of 
the  Initial  Review  Groups  on  new  projects  with  their  judgments  on  renewal  projects. 
Finally,  I  deal  with  changes  in  the  IRGs’ judgment  of  new  applications  and  with  the 
relationship  between  the  judgment  of  the  new  study  section  on  a  particular  appli¬ 
cation  and  their  judgments  of  previous  applications  from  the  same  principal  inves¬ 
tigator. 

The  judgments  of  the  study  sections  have  two  components:  the  priority  score^ 
awarded  to  applications  recommended  for  approval  and  the  percentage  of  appli¬ 
cations  recommended  for  disapproval.  As  can  be  seen  from  Fig.  3,  there  was  a  decline 
between  1968  and  1973  in  the  percentage  of  both  new  and  renewal  applications  that 


^  Each  member  of'a  study  section  awards  a  score  for  the  application  in  the  range  between  1  (best)  and 
5  (worst).  These  scores  are  then  averaged  and  multiplied  by  100  to  yield  the  priority  score  for  the 
application. 


8 


Fiscal  year 

Fig.  3 - Study  section  judgments  over  time;  fraction  disapproved 


were  disapproved.'*  The  average  priority  score  received  on  new  applications  was 
unchanged  between  1968  and  1972®  and  slightly  worse  in  1973  (see  Fig.  4).  However, 
for  renewal  applications  the  average  priority  score  improved  (i.e.,  declined)  between 
1968  and  1972  and  increased  in  1973  at  a  smaller  rate  than  for  new  applications. 
The  improving  trend  in  the  average  priority  score  received  on  renewal  applications 
is  significant  at  the  0.01  level  as  is  an  analysis  of  variance  of  the  effect  of  year. 


Strictly  speaking,  the  study  sections  only  make  recommendations  to  the  National  Advisory  Councils, 
which  actually  approve  or  disapprove  applications.  In  practice  the  Councils  usually  follow  the  recommen¬ 
dation  of  the  Study  Section.  For  the  sake  of  brevity  I  describe  the  Study  Sections’  actions  as  approval 
or  disapproval  rather  than  as  recommending  a'p^roval  or  disapproval. 

^  An  analysis  of  variance  shows  no  statistical  difference  during  1968-1972  at  the  10  percent  level. 


250 

240 

230 

220 

210 

200 

190 

180 

170 

(! 

_  < 


9 


All  funded  grants 


1 

68 


i I 

69  70 

Fiscal  year 


L 

71 


i_I 

72  73 


section  judgments  over  time:  priority  score 


10 


In  view  of  the  constant  or  increasing  priority  scores  received  by  new  appli¬ 
cations,  the  improvement  in  renewal  scores  requires  an  explanation.  One  possibility 
is  that  study  sections  are  becoming  anxious  about  funding  grants  to  those  who  have 
proven  track  records.  Study  sections  might,  perhaps  unconsciously,  award  them 
better  scores  than  they  would  have  received  for  an  application  of  the  same  merit  at 
an  earlier  time.  However,  another  explanation  can  be  found  in  the  delayed  effect  of 
the  tighter  budgets  that  NIH  experienced  in  the  early  part  of  this  period.  As  compe¬ 
tition  increased,  a  declining  fraction  of  applications  could  be  funded  (see  Fig.  1).  If 
the  peer  review  process  resulted  in  the  selection  of  the  best  grants  each  year,  then, 
since  within  each  program  of  an  Institute  grants  are  funded  approximately  in 
priority  score  order,  the  funding  of  a  smaller  fraction  of  grants  would  imply  the 
funding  of  a  set  of  grants  of  higher  average  quality.  This  in  turn  would  imply  that 
the  grants  appearing  for  renewal  in  subsequent  years  would  be  of  higher  average 
quality  each  year,  and  that  the  disapproval  rate  should  fall  and  average  priority 
scores  should  improve.  The  average  score  received  on  funded  applications  improved 
dramatically  as  the  fraction  of  applications  that  were  funded  declined.  This  can  be 
seen  in  the  bottom  line  in  Fig.  4.  In  the  following  I  examine  whether  this  improve¬ 
ment  in  priority  scores  on  funded  applications  can  explain  the  observed  trends  in 
the  judgments  of  renewal  applications  by  the  study  sections. 


DISAPPROVAL  RATE  FOR  RENEWAL  APPLICATIONS 

To  determine  how  much  of  the  decline  in  the  fraction  of  renewal  applications 
disapproved  can  be  explained  by  the  score  the  grant  received  the  previous  time  it 
was  reviewed  and  funded,  I  estimate  the  parameters  of  a  function  that  describes  the 
probability  that  a  renewal  application  will  be  recommended  for  disapproval  in  terms 
of  the  priority  score  the  grant  received  the  previous  time  it  was  reviewed  and  funded 
and  the  year  in  which  the  renewal  was  processed.  Since  the  probability  of  disapprov¬ 
al  of  applications  appearing  for  renewal  for  the  first  time  may  differ  from  that  of 
longer  established  grants,  I  also  consider  a  variable  that  equals  1  if  the  preceding 
application  was  also  a  renewal  application  as  a  possible  determinant  of  the  probabil¬ 
ity  of  disapproval.  A  logit  equation  is  used  that  assumes  the  probability  of  disapprov¬ 
al,  p,  is  related  to  the  explanatory  variables  as  follows: 


(1) 


where  Xi  is  the  priority  score  received  on  the  previous  application, 

X2  is  1  if  the  previous  application  was  itself  a  renewal  and  0 
otherwise, 

X3  is  a  trend  variable  to  account  for  the  effect  of  year.  X3  ==  year 
of  application  minus  1968,  and 
e  is  the  error  term. 

The  coefficients  associated  with  each  Xj  are  shown  in  Table  1.  To  estimate  the 
probability  that  a  particular  renewal  application  would  be  disapproved,  one  would 


11 


Table  1 

PROBABILITY  A  RENEWAL  APPLICATION 
WILL  BE  DISAPPROVED^ 


Number  of  data  points^ 

3020 

Chi-square 

202. A  [3  df] 

Constant  term 

-3.371 

Earlier  priority  score 

0.012^ 

(11.9) 

Earlier  application  is  renewal 

-0.105 

(1.0) 

Trend 

-0.150^^ 

(3.5) 

^The  estimation  technique  is  a  maximum  like¬ 
lihood  logit  technique  developed  by  Marc  Nerlove, 
a  Rand  Corporation  consultant  at  Northwestern 
University,  and  Kenneth  Maurer  of  The  Rand  Cor¬ 
poration. 

Coefficients  whose  significance  level  are  not 
noted  are  not  significant  at  the  0.05  level. 

^Because  of  data  availability,  the  sample  is 
restricted  to  renewal  applications  for  which  the 
previous  competing  application  was  funded  in  FY 
1967  or  later, 
c 

Asymptotic  t-statistics  are  in  parentheses. 
Significance  level  <  0.001. 


determine  the  value  of  each  Xi  for  that  grant  and  then  calculate  L(p)  from  Eq.  (1). 
The  probability  is  then  given  by: 


_ 1 _ 

1  +  e-L(P) 


(2) 


Roughly  speaking,  the  significance  level  of  a  coefficient  of  the  equation  gives  the 
probability  that  a  set  of  data  would  yield  a  coefficient  of  this  magnitude  if  the 
independent  variable  were  not  related  to  L(p).  Thus,  there  is  only  one  chance  in  1000 
that  the  probability  that  a  renewal  application  will  be  disapproved  has  remained 
stable  over  time.  The  trend  is  toward  fewer  disapprovals.  The  probability  of  disap¬ 
proval  is  also  strongly  related  to  the  score  received  the  previous  time  the  grant  was 
reviewed,  but  it  does  not  depend  on  whether  the  previous  application  was  itself  a 
renewal. 

The  reason  for  the  apparent  increase  in  the  rate  at  which  renewal  applications 
are  recommended  for  approval  is  not  clear.  It  may  be  that  the  study  sections  are 
hoping  to  influence  budget  decisions  with  a  larger  number  of  approved  but  unfunded 
grants.  It  also  may  in  part  be  self-selection  by  the  grantees.  That  is,  as  funds  become 
tighter,  those  who  have  made  little  progress  on  their  previous  grant  may  become 
reluctant  to  apply  for  a  renewal  because  they  feel  their  chance  for  funding  is  very 
small,  and  a  substantial  effort  is  involved  in  application.  In  future  work  I  plan  to 


12 


check  on  the  rate  at  which  renewal  applications  are  received,  but  this  has  not  yet 
been  done. 

Table  2  permits  comparison  of  the  magnitude  of  the  effect  of  time  with  the 
magnitude  of  the  effect  of  the  improved  quality  of  the  grants  appearing  for  renewal. 
The  third  row  shows  the  actual  fraction  of  these  renewal  applications  recommended 
for  disapproval  each  year.  In  the  next  row  the  logit  equation  with  trend  (from  Table 
1)  is  used  to  estimate  the  fraction  of  grants  that  would  have  been  disapproved  each 
year  if  the  distribution  of  earlier  priority  scores  on  the  applications  appearing  for 
renewal  had  been  the  same  as  those  that  actually  occurred  in  1971.  The  entry  for 
1969  was  calculated  by  first  evaluating  the  logit  Eqs.  (1)  and  (2)  for  each  1971 
application,  but  with  Xa  set  to  1.  The  entry  is  the  average  of  the  resulting  values  of 
p.  Entries  for  years  1970-1973  were  calculated  similarly  with  the  corresponding 
value  of  X3.  This  row  holds  the  distribution  of  earlier  priority  scores  constant  and 
displays  the  magnitude  of  the  real  change  in  disapproval  rates  over  time:  from  0.238 
in  1969  to  0.150  in  1973.  The  final  row  shows  the  estimate  evaluated  at  1971  using 
each  year’s  actual  distribution  of  priority  scores.  It  was  calculated  by  evaluating  the 
logit  equations  for  each  year’s  applications,  but  with  X3  set  to  3  for  1971.  Therefore, 
this  row  shows  an  estimate  of  the  magnitude  of  the  change  in  disapproval  rate  that 
can  clearly  be  attributed  to  the  improving  quality  of  the  grants  appearing  for  renew¬ 
al:  from  0.236  in  1968  to  0.148  in  1973.  The  magnitudes  of  the  effects  of  time  and 
of  the  improved  quality  of  the  grants  are  roughly  equal. 


Table  2 

ESTIMATES  OF  THE  PROBABILITY  THAT  A  RENEWAL 
APPLICATION  WILL  BE  DISAPPROVED^ 


1969 

1970 

1971 

1972 

1973 

Number  of  renewal 
applications 

Average  preceding 

112 

578 

656 

910 

744 

priority  score 
Fraction  disapproved 

243.3 

228.0 

223.2 

208.1 

198.5 

Actual 

Estimate  adjusted 
to  1971  priority 
score  distribu- 

0.295 

0.218 

0.203 

0.154 

0.105 

tion 

Estimate  of  1971 

rate  at  each 
year's  priority 
score  distribu- 

0.238 

0.213 

0.191 

0.170 

0.150 

tion 

0.236 

0.204 

0.191 

0.166 

0.148 

3 

The  table  uses  only  renewal  applications  for  which 
the  previous  competing  application  was  funded  in  FY  1967 
or  later. 


13 


PRIORITY  SCORE  RECEIVED  ON  RENEWAL  APPLICATIONS 

For  any  one  grant,  the  score  received  on  a  renewal  application,  S,  and  the  score 
received  by  the  same  grant  the  previous  time  it  was  reviewed  and  funded,  Xi,  should 
not  be  the  same,  because  of  the  uncertainty  involved  in  any  research  enterprise. 
Between  the  two  peer  judgments,  the  investigator  may  have  produced  evidence  to 
show  his  approach  is  more  promising  than  the  study  section  had  first  thought,  or 
he  may  have  been  able  to  make  much  less  progress  than  hoped  for.  Discoveries  in 
other  areas  of  research  may  alter  perceptions  about  which  areas  are  most  important 
to  pursue  and  which  are  most  likely  to  succeed.  In  addition,  there  is  the  problem  of 
the  error  of  measurement  in  each  priority  score;  that  is,  a  study  section  composed 
of  different  people  reviewing  the  same  application  at  the  same  time  would  have 
produced  a  slightly  different  priority  score. 

The  correlation  coefficient  between  S  and  Xi  shows  the  magnitude  of  the  relation 
between  these  two  variables.  If  S  could  be  predicted  perfectly  as  a  linear  function 
of  Xi  then  the  correlation  coefficient  would  be  1.  If  there  was  no  linear  relationship, 
then  the  correlation  coefficient  would  be  0.  Since  it  was  possible  that  new  and 
renewal  applications  are  treated  somewhat  differently,  I  divided  these  pairs  of  pri¬ 
ority  scores  into  two  categories,  depending  on  whether  Xi  — the  score  on  the  earlier 
application — came  from  a  new  or  from  a  renewal  application.  The  correlation  coeffi¬ 
cient  between  Xi  and  S  was  0.40  if  xi  came  from  a  new  application  and  0.37  if  Xi  came 
from  a  renewal  application.  These  correlations  are  quite  low  and  indicate  that  S 
cannot  be  predicted  very  well  from  Xi.  Although  it  is  at  least  theoretically  possible 
for  this  low  value  to  be  due  entirely  to  errors  of  measurement,  it  would  require  that 
priority  scores  be  extremely  unreliable  (that  is,  with  reliability  coefficients  of  the 
order  of  0.4).  Since  from  the  limited  amount  of  information  available  such  low  values 
seem  unlikely,®  it  is  more  reasonable  to  assume  that  there  actually  is  a  large  differ¬ 
ence  in  the  scientific  merit  of  the  two  proposals  and  that  the  study  section  judgments 
are  not  judgments  just  of  the  individual  performing  the  research  but  also  of  the 
project  at  that  moment  of  time.  A  project  returning  to  competition  for  the  third  or 
more  time  may  have  a  score  quite  different  from  its  previous  scores.  Even  these 
grants,  presumably  from  well-established  investigators,  appear  to  be  examined  quite 
critically. 

Regression  analysis  is  used  to  examine  the  joint  relationship  between  a  depen¬ 
dent  variable  and  several  possible  explanatory  variables.  In  this  case  we  are  inter¬ 
ested  in  the  relationship  between  S,  the  score  received  on  a  renewal  application,  and 
the  same  set  of  explanatory  variables  used  for  the  probability  of  disapproval.  The 
form  of  the  relationship  is  assumed  to  be: 

S  =  a^  +  a^x^  +  33X2  +  agXg  +  e  , 

where  again  X2  is  1  if  the  previous  application  was  itself  a  renewal  and  0 
otherwise, 

X3  =  year  of  application  minus  1968. 


®  The  only  data  available  on  the  size  of  the  error  of  measurement  are  from  the  1965  NIH  study  by 
Saunders  and  Gordon  that  estimated  the  reliability  of  the  priority  score  to  range  from  0.89  to  0.95.  These 
estimates  were  derived  as  a  by-product  of  an  investigation  with  a  different  purpose  and  were  based  on 
the  variance  in  the  ratings  of  the  individual  scientists  who  compose  a  study  section,  rather  than  on  ratings 
of  the  same  application  by  different  study  sections.  Since  it  is  probable  that  votes  of  the  members  of  the 
same  study  section  are  not  independent,  these  values  probably  overestimate  the  reliability. 


14 


The  coefficients  of  the  relationship  are  displayed  in  Table  3  along  with  signifi¬ 
cance  levels.  After  one  controls  for  the  priority  score  received  on  the  immediately 
preceding  competing  application  and  its  type,  no  trend  remains  in  the  data.  Thus, 
one  can  conclude  that  the  study  section  behavior  in  awarding  priority  scores  to 
renewal  applications  has  not  changed  over  this  period^  and  that  the  improving  trend 
in  the  priority  scores  is  due  entirely  to  an  improvement  in  the  average  quality  of 
the  grants  appearing  for  renewal. 


Table  3 

EFFECT  OF  YEAR  ON  PRIORITY  SCORE  RECEIVED 
ON  RENEWAL  APPLICATIONS^ 


Estimate 

Number  of  data  points^ 

2504 

r2 

0.14 

Constant  term 

123.5 

Earlier  priority  score 

Earlier  application  is 

0.546b 

(20.1) 

renewal 

-7.10 

(2.66)^ 

Trend 

0.845 

(0.7) 

^Thls  table  uses  only  renewal  ap¬ 
plications  for  which  the  previous 
competing  application  was  funded  in 
FY  1967  or  later. 

^Significant  at  0.001  level. 

^Significant  at  0.01  level, 
t-statistics  are  in  parentheses. 


Both  the  rate  at  which  grants  are  disapproved  and  the  priority  score  received 
on  renewal  applications  reflect  an  improvement  in  the  average  quality  of  the  grants 
appearing  for  renewal  each  year.  The  average  priority  score  that  each  year’s  renew¬ 
al  application  received  the  previous  time  it  was  evaluated  declines  over  time  in  a 
similar  fashion.  This  means  that  the  later  study  sections,  although  composed  for  the 
most  part  of  different  people,  verify  the  earlier  study  sections’  selection  of  the  set 
of  grants  that  were  awarded  good  enough  priority  scores  to  fund.  It  is  a  clear 
indication  that  the  concept  of  "scientific  merit”  contains  enough  objective  content 
that  different  groups  of  people  meeting  several  years  apart  will  agree  that  one  set 
of  grants  is  more  scientifically  meritorious  than  another.  The  low  correlation  coeffi¬ 
cient  between  the  priority  scores  received  on  different  occasions  by  individual  grants 
testifies  both  to  the  uncertain  nature  of  scientific  enterprises  and  to  the  willingness 
of  study  sections  to  make  critical  examinations  of  applications  even  from  well- 
established  investigators. 


’  Replacing  the  trend  terms  by  dummies  for  each  year  shows  no  significant  effect  of  any  year. 


15 


DIFFERENCE  BETWEEN  JUDGMENTS  OF  NEW  AND 
RENEWAL  APPLICATIONS 

The  preceding  regression  shows  a  difference  in  the  renewal  score  depending  on 
whether  the  earlier  application  came  from  a  new  or  a  renewal  application.  Assume 
that  the  score  received  on  a  renewal  application  represents  a  judgment  of  the  quality 
of  the  research  performed  during  the  preceding  grant  period.  Then  one  can  ask  what 
is  the  difference  in  priority  scores  between  new  and  renewal  applications  after  one 
controls  for  the  quality  of  the  research — that  is,  the  second  score.  I  thus  regress 

Xi  =  Bq  +  ai  X  S  +  a2  X  X2  +  e  . 

The  regression  results  are  shown  in  Table  4.  After  one  controls  for  the  subse¬ 
quent  priority  score,  renewal  grants  receive  a  score  on  the  average  eight  points 
worse  than  the  score  received  by  new  applications. 

It  is  interesting  to  consider  this  apparent  bonus  to  new  applications  in  light  of 
the  uncertain  nature  of  many  scientific  research  projects.  It  is  likely  that  the  scien¬ 
tific  outcome  of  a  new  research  project  is  more  uncertain  than  the  outcome  of  a 
renewal  research  project.  There  may  be  a  greater  chance  that  a  new  project  will 
achieve  the  kind  of  scientific  result  that  can  be  termed  a  breakthrough  and  a  greater 
chance  that  the  project  will  lead  into  a  dead  end.  If  so,  then  it  may  well  be  that  the 
apparent  bonus  received  by  new  applications  represents  an  assessment  of  the  value 
of  a  "breakthrough”  compared  with  the  value  of  the  more  mundane  sort  of  work  that 
constitutes  the  normal  progress  of  science. 


Table  4 

PRIORITY  SCORE  ASSIGNED  TO  RENEWAL 
COMPARED  WITH  NEW  APPLICATIONS 


Number  of  samples 

2504 

r2 

0.15 

Constant  term 

136.22 

Priority  score  on  sub- 

sequent  renewal 

0.266^ 

(20.5) 

Renewal 

8.676^ 

(4.7) 

^Significant  at  the  0.001  level, 
t-statistics  are  in  parentheses. 


JUDGMENT  OF  NEW  APPLICATIONS  . 

One  question  of  interest  is  whether  the  judgment  of  new  applications  has 
changed  over  time.  Figures  3  and  4  show  that  the  average  priority  score  remained 
constant  in  the  1968-1972  period  and  rose  in  1973,  while  the  fraction  of  new  appli- 


16 


cations  that  were  recommended  for  disapproval  declined.  However,  the  quality  of 
new  applications  may  also  have  changed  during  this  period.  If  an  investigator  who 
currently  has  a  grant  is  less  likely  to  apply  for  a  new  grant  than  he  would  be  if  he 
did  not  have  a  grant,  then  as  funding  levels  declined,  the  number  of  good  nonfunded 
investigators  would  increase  and  the  quality  of  new  applications  would  improve. 

Another  question  of  some  general  interest  is  the  extent  to  which  the  judgments 
of  the  study  section  reflect  their  opinion  of  the  individual  principal  investigator  as 
distinct  from  their  opinion  of  the  particular  project.  Certainly  the  value  of  a  re¬ 
search  project  must  be  related  to  the  scientiflc  ability  and  qualifications  of  the 
principal  investigator,  but  one  would  imagine  that  a  particular  investigator  could 
apply  for  projects  of  varying  degrees  of  merit.  The  judgments  of  the  study  section 
should  reflect  opinions  of  both  the  individual  and  the  research  proposal.  Therefore, 
we  are  interested  in  the  relationship  between  the  study  section’s  judgment  of  a  new 
application  and  the  judgments  of  earlier  applications  from  the  same  individual.  For 
each  new  application,  I  use  all  earlier  new  and  competing  renewal  applications 
received  from  the  same  principal  investigator.  Because  some  time  is  needed  for  these 
histories  to  accumulate,  I  consider  only  applications  received  for  FY  1971  and  later 
for  which  at  least  one  preceding  application  had  received  a  priority  score.  The 
priority  score  received  on  the  new  application  is  assumed  to  be  given  by: 

Uq  +  a^f  +  32  X  S  +  ag  X  Zy2  ^4  ^  ^7  3  ^  ’ 

where J'  =  fraction  of  preceding  applications  that  were  disapproved, 

S  =  average  priority  score  on  earlier  applications, 

Zi  =  1  if  the  new  application  was  for  year  i  and  0  otherwise. 

A  logit  regression  of  the  probability  that  the  new  application  will  be  disapproved  was 
also  run  on  the  same  equation. 

Table  5  shows  the  results  of  both  regressions.  In  both  cases  the  judgment  of  the 
study  sections  is  significantly  related  to  earlier  judgments  of  the  same  individual. 
The  judgments  made  in  1972  are  indistinguishable  from  those  made  in  1971,  but  in 
1973  grants  had  a  greater  chance  for  approval  and  a  worse  average  priority  score. 

The  correlation  coefficient  between  the  priority  score  received  on  a  new  appli¬ 
cation  and  the  average  score  received  on  all  earlier  applications  is  0.35.  This  is  large 
enough  for  certainty  that  there  is  a  statistical  relationship  between  the  merits  of 
successive  proposals  by  the  same  individual  as  judged  by  study  sections.  However, 
it  also  is  small  enough  to  allow  the  major  part  of  the  variance  in  scores  to  be 
explained  by  the  difference  in  the  merits  of  the  different  projects  proposed  by  the 
same  individual. 


RELATION  BETWEEN  RENEWAL  JUDGMENT  AND 
JUDGMENTS  OF  OTHER  APPLICATIONS 

The  correlation  coefficient  between  the  priority  score  received  by  a  funded  appli¬ 
cation  and  the  score  received  on  a  subsequent  renewal  application  is  0.38.  My 
interpretation  of  this  low  value  is  that  it  is  due  to  the  inherent  uncertainty  involved 
in  previous  judgments  of  which  proposals  will  constitute  the  most  meritorious  pro- 


17 


Table  5 

INITIAL  REVIEW  CROUP  JUDGMENT  OF  NEW  APPLICATIONS 


Dependent  Variable 

Priority 

Score 

Probability  of 
Disapproval 

Number  of  samples 

2460 

3522 

r2  =  0.13 

=  165,2  [3  df] 

Dependent  variable 

fraction  disapproved 

35.4"^ 

1.46^ 

(5.6) 

(9.9) 

Average  priority  score 

0.371^ 

0.0028^ 

(16.7) 

(5.2) 

Year  1972 

4.68 

-0,085 

(1.2) 

(0.9) 

Year  1973 

10. 26^* 

-0.296^ 

(2.7) 

(3.2) 

NOTE;  Numbers  in  parentheses  are  t-statistics  for 
the  priority  score  regression  and  asymptotic  t-ratios 
for  the  logit  regression  on  the  probability  of  dis¬ 
approval  . 

^Significant  at  the  0.001  level. 

'^Significant  at  the  0.01  level. 


jects.  Under  this  interpretation,  the  part  of  the  variance  in  the  second  score  explain¬ 
able  by  the  earlier  priority  score  represents  the  power  of  the  study  sections  to  predict 
which  projects  will  be  successful  enough  to  merit  continuation.  Presumably  this 
prediction  incorporates  a  judgment  of  both  the  individual’s  general  ability  and  the 
particular  project.  The  correlation  between  an  individual’s  previous  scores  and  his 
score  on  a  new  application  is  of  the  same  order  of  magnitude  as  the  correlation 
between  scores  on  succeeding  applications  for  the  same  grant.  This  might  make  one 
wonder  whether  the  correlation  between  renewal  and  earlier  scores  actually  repre¬ 
sents  a  judgment  of  the  particular  project  in  addition  to  a  judgment  of  the  principal 
investigator’s  ability.  One  way  to  look  at  this  question  is  to  examine  the  relationship 
between  the  score  on  the  renewal  application  and  the  score  on  the  earlier  appli¬ 
cation  for  the  same  project,  while  controlling  for  the  judgments  of  study  sections  on 
earlier  applications  for  other  projects  from  the  same  principal  investigator.  Again 
applications  are  restricted  to  those  for  1971  and  later  and  to  applications  from 
principal  investigators  who  had  applied  for  and  received  a  priority  score  on  at  least 
one  other  project  earlier  than  the  renewal  application. 

I  regress  the  priority  score  assigned  to  the  renewal  application  for  the  same 
project  on  a  dummy  variable  that  is  1  if  that  earlier  application  was  a  renewal,  the 
fraction  of  other  applications  that  were  disapproved,  and  the  average  priority  score 
received  on  other  applications.  The  result  of  this  regression  is  shown  in  Table  6.  The 
earlier  score  for  the  same  project  is  a  more  important  explanatory  variable  for  the 
second  score  than  the  judgments  of  study  sections  about  the  worth  of  other  projects 
by  the  same  principal  investigator.  For  this  subset  of  grants,  the  proportion  of  the 


18 


Table  6 

RENEWAL  PRIORITY  SCORE  AS  FUNCTION 
OF  EARLIER  SCORES 


Number  of  samples 

898 

r2 

0,25 

Constant  term 

77.74 

Earlier  score  on  same  project 

0.469^ 

(13.3) 

Average  score  on  other  projects 

0.193^ 

Fraction  of  other  applications 

(5.7) 

disapproved 

16.25 

(1.6) 

Earlier  application  renewal 

-0.19 

(0.05) 

^Significant  at  the  0.001  level, 
t-statistics  are  in  parentheses. 


variance  in  the  second  score  of  this  set  of  applications  that  can  be  explained  by  the 
earlier  priority  score  alone  is  0.21,  and  the  part  that  can  be  explained  by  a  combina¬ 
tion  of  both  score  and  fraction  disapproved  on  earlier  applications  is  0.079.  The  study 
section’s  judgment  of  each  application  clearly  reflects  the  quality  of  both  the  princi¬ 
pal  investigator  and  the  particular  research  proposal. 


SUMMARY 

Both  the  rate  at  which  grants  are  disapproved  and  the  priority  score  received 
on  renewal  applications  reflect  an  improvement  in  the  average  quality  of  the  grants 
appearing  for  renewal  each  year.  This  means  that  the  later  study  sections,  though 
composed  for  the  most  part  of  different  people,  verify  the  earlier  study  sections’ 
selection  of  the  set  of  grants  that  were  awarded  good  enough  priority  scores  to  fund. 
The  concept  of  "scientific  merit’’  obviously  contains  enough  objective  content  that 
different  groups  of  people  meeting  several  years  apart  will  agree  that  one  set  of 
grants  is  more  scientifically  meritorious  than  another  set  of  grants.  The  low  correla¬ 
tion  coefficient  between  the  priority  scores  received  on  different  occasions  by  in¬ 
dividual  grants  testifies  to  both  the  uncertain  nature  of  scientific  enterprises  and  the 
willingness  of  study  sections  to  make  critical  examinations  of  applications  from  even 
well-established  investigators. 

The  rate  at  which  applications  for  renewal  are  disapproved  appears  to  have 
declined  between  1968  and  1972,  but  there  is  no  trend  evident  in  the  priority  scores 
received  on  approved  renewal  applications  during  this  period.  The  judgment  of 
applications  for  new  projects  reflected  a  lower  rate  of  disapproval  and  a  worse 
average  priority  score  in  FY  1973  than  in  FY  1971-1972. 

Judgments  of  a  new  application  are  statistically  significantly  related  to  judg¬ 
ments  of  earlier  applications  by  the  same  individual.  However,  the  relationship  is 


19 


weak  enough  to  allow  the  largest  part  of  the  variance  in  the  priority  score  to  be 
explained  by  the  merit  of  the  project.  The  judgment  of  a  renewal  application  is  also 
related  to  the  judgments  of  the  study  sections  on  applications  from  the  same  appli¬ 
cant  for  other  projects,  but  it  is  more  strongly  related  to  the  priority  score  received 
on  the  previous  application  for  the  same  grant. 


III.  METHODOLOGY  FOR  SAMPLING  GRANTS  AND 
OBTAINING  CITATION  DATA 


To  explore  the  use  of  citations  as  a  measure  of  research  output,  I  chose  a  sample 
of  747  research  project  grants  and  51  program  project  grants  awarded  on  a  competi¬ 
tive  basis  in  FY  1967.  References  to  all  publications  produced  under  each  of  these 
grants  and  listed  in  the  Research  Grants  Index  for  FY  1967  to  FY  1970  were  re¬ 
trieved.  We  purchased,  from  the  Institute  for  Scientific  Information,  a  record  of  all 
the  citations  of  these  publications  that  had  been  recorded  in  the  Science  Citation 
Indexfor  years  1968  to  1972.  This  section  describes  the  selection  of  the  grants  and 
the  retrieval  of  the  citation  data. 

Biomedical  research  is  performed  under  the  auspices  of  several  of  the  organiza¬ 
tional  entities  that  constitute  the  academic  health  center:  the  medical  school,  its 
affiliated  hospitals,  and  in  certain  medical  schools  corporately  separate  research 
institutes.  Grants  awarded  to  these  entities  (the  awards  are  usually  to  members  of 
a  medical  school  faculty)  are  the  subject  of  this  study  and  were  candidates  for 
inclusion  in  the  sample.  Hospitals  were  restricted  to  those  listed  in  The  Directory 
of  Internships  and  Residencies  as,  having  a  "major”  affiliation  with  a  medical  school 
in  1970. 

To  allow  the  longest  time  possible  for  citations  to  occur,  the  sample  of  grants  is 
restricted  to  those  awarded  in  FY  1967,  the  earliest  year  for  which  the  IMP  AC  data 
were  available.  Projects  funded  in  FY  1967  have  starting  dates  between  July  1966 
and  June  1967.  The  sample  is  restricted  to  research  project  grants  (designated  ROl 
in  the  IMPAC  file),  and  program  project  grants  (designated  POl  in  the  IMP  AC  file). 
Although  there  is  policy  interest  in  additional  funding  mechanisms,  particularly  the 
new  Center  programs  of  the  various  Institutes,  too  few  of  these  programs  were 
funded  in  1967  to  permit  comparison. 

Because  I  wished  to  examine  the  relationship  between  priority  score  and  citation 
measures,  I  restricted  the  sample  to  grants  reviewed  and  awarded  a  priority  score 
by  a  regular  study  section  of  the  Division  of  Research  Grants.  In  1967,  NIH  awarded 
1864  such  research  project  grants  to  medical  schools,  affiliated  hospitals,  and  re¬ 
search  organizations  on  a  competitive  basis  (excluding  continuations).  Since  I  am 
considering  only  grants  to  medical  schools,  I  also  omitted  grants  from  the  dental 
study  section  and  from  six  other  study  sections  for  which  only  nine  or  fewer  grant 
applications  are  available.  The  numbers  of  grants  to  be  omitted  in  each  category  are 
shown  in  Table  7. 

A  45  percent  sample  of 747  of  the  1675  remaining  grants  was  chosen.  The  sample 
included  all  the  research  grants  awarded  to  the  ten  medical  schools  originally 
chosen  (in  the  larger  study)  to  be  representative  of  all  medical  schools  in  the  United 
States.^  The  remaining  grants  were  chosen  to  supplement  these  grants  through 
random  samples  stratified  first  on  Initial  Review  Group  and  then  on  priority  score 
within  the  Initial  Review  Group. 

Because  only  51  program  project  grants  were  awarded  to  medical  schools  on  a 


'  See  Williams,  1974,  for  an  explanation  of  the  selection  procedure. 


20 


21 


Table  7 

RESEARCH  PROJECT  GRANTS  OMITTED  FROM  SAMPLE 


Initial  Review  Group 

Field  Contained 

Number  of 
Grants 

Special  study  section 

31 

No  Initial  Review  Group 

27 

Study  sections  for  which  there  are 

not  enough  applications^ 

31 

Unidentifiable  study  section 

67 

Special  review  codes 

Internal  review 

4 

No  initial  review 

6 

Ad  hoc  review 

15 

No  priority  score  recorded  on  file 

35 

Total  omitted  grants 

189 

Total  number  of  grants 

1864 

Available  for  sampling 

1675 

^Omitted  study  sections:  Dental,  Epi¬ 
demiology  and  Disease  Control,  Experimental 
Psychology,  History  of  Life  Sciences,  Medic¬ 
inal  Chemistry  A,  Medicinal  Chemistry  B,  and 
Toxicology . 


competitive  basis  in  1967,  the  entire  set  of  such  grants  was  included  in  the  sample. 
In  an  attempt  to  capture  the  results  of  as  much  as  possible  of  the  research  produced 
under  these  grants,  all  publications  listed  in  the  Research  Grants  Index  FY  1967 
to  1970  were  retrieved.  There  are  delays  in  publication  in  many  good  journals,  so 
it  is  certain  that  some  of  the  research  results  have  been  omitted. 

Table  8  lists  the  numbers  of  publications  retrieved.  The  inclusion  of  publications 
from  1966  deserves  some  comment.  Practically  all  of  these  publications  came  from 
renewal  applications  (46  percent  of  the  grants  in  the  sample),  and  it  is  at  least 
possible  (although  unlikely)  that  the  research  reported  in  the  1966  publications  was 
performed  during  the  first  year  of  the  renewal  of  the  grant.  However,  since  it  is  a 
common  practice  to  have  begun  research  on  a  project  before  a  grant  is  awarded,  I 
included  the  1966  publications  as  part  of  the  research  project  that  was  evaluated  by 
the  Initial  Review  Group. 

The  bibliographic  information  available  in  the  Research  Grants  Index  readily 
identified  books,  theses,  and  many  oral  presentations  at  scientific  meetings.  How¬ 
ever,  some  journals  are  entirely  devoted  to  publishing  abstracts  of  such  presenta¬ 
tions,  and  other  journals  devote  sections  to  this  use.  These  abstracts  have  a  much 
lower  citation  rate  than  do  regular  journal  articles  as  can  be  seen  in  Table  8.  The 
bibliographic  citation  in  the  Research  Grants  Index  often  gave  no  indication  as  to 
whether  a  publication  was  an  entire  article  or  an  abstract.  Since  it  was  impossible 


22 


Table  8 

KINDS  OF  PUBLICATIONS  FROM  SAMPLE  OF  NIH  RESEARCH  GRANTS 


Publication 

1966 

1967 

1968 

1969 

1970 

Total 

Journal  articles 

Number 

624 

1,143 

1,264 

1,065 

255 

4,351 

Citat ions /article 

II. 18 

10.41 

8.58 

7.56 

Books 

Number 

29 

60 

109 

106 

66 

370 

Citation/book 

3.07 

2.77 

3.45 

1.43 

1.86 

Talks  and  abstracts 

Number 

97 

270 

369 

248 

51 

1,035 

Citat ions/talk 

0.83 

0.82 

0.76 

0.70 

0.53 

Theses 

Number 

4 

7 

7 

8 

I 

27 

Citat ion/ thesis 

0.50 

0.57 

0.43 

1.25 

0.0 

Total  publica- 

tions 

754 

1,480 

1,749 

1,427 

373 

5,783 

Total  citations 

7,148 

12,290 

11,505 

8,387 

2,009 

41,339 

to  examine  each  of  the  journal  publications,  only  those  known  to  be  abstracts  have 
been  placed  in  that  category.^ 

Much  research  is  produced  jointly  under  two  or  more  grants  from  NIH  or  other 
private  or  federal  agencies.  Medical  schools  also  contribute  to  the  support  of  re¬ 
search  through  the  salaries  of  faculty  and  occasionally  other  personnel.  It  is  impossi¬ 
ble  to  attribute  parts  of  the  research  results  to  each  of  these  joint  sources  of  support; 
therefore,  I  analyze  the  data  as  if  the  research  were  the  product  of  only  a  single  NIH 
grant.  In  cases  where  the  Research  Grants  Index  attributed  a  single  publication  to 
more  than  one  grant  in  the  sample,  the  article  is  treated  as  if  separate  articles  had 
been  produced  by  each  grant.  This  introduces  no  more  distortion  into  the  analysis 
than  there  is  already  for  articles  produced  through  one  grant  in  the  sample  and 
other  unknown  sources  of  support. 

Citations  to  the  articles  were  retrieved  by  the  Institute  for  Scientific  Information 
from  its  Science  Citation  Index  tape  file.  The  method  used  by  ISI  to  retrieve  the 
citations  is  described  by  Jack  Byk,  Coordinator  of  Corporate  Information  Systems 
for  ISI:^ 

(a)  We  [ISI]  received  264  pages  with  double  spaced  entries  which  were  key¬ 
punched  and  put  on  magnetic  tape.  We  keypunched  5937  cards'*  containing 
the  information  which  Rand  furnished  us,  consisting  of  a  Rand  number, 
cited  author,  journal,  volume,  page  and  year.  Theses  and  book  reviews  were 
keypunched  in  a  manner  consistent  with  ISI’s  processing  to  assure  proper 

^  The  set  of  talks  and  abstracts  includes  all  publications  in  Clinical  Research.  Pharmacologist,  Physi¬ 
ologist;  supplements  to  Circulation;  and  abstracts  in  Federation  Proceedings  and  all  publications  that 
were  listed  in  the  Research  Grants  Index  as  either  talks  or  abstracts. 

“  Private  communication. 

“  The  difference  between  these  5937  entries  and  the  5783  publications  listed  in  Table  8  is  due  to 
unintentional  duplications  of  publications  on  our  manually  produced  list  of  articles. 


23 


matching.  Cited  author’s  names  were  also  keypunched  in  the  ISI  format 
with  truncation  of  surnames,  contraction  of  names,  etc.,  to  assure  proper 
matching. 

(h)  The  5937  records  were  matched  against  the  ISI  Citation  Indexia\)QB  for  the 
years  1968, 1969, 1970, 1971,  and  1972,  approximately  20.7  million  records. 
The  first  match  was  performed  on  6  characters  of  the  author’s  name,  1 
character  of  the  cited  journal  and  the  year.  This  created  a  sub-file  of  ap¬ 
proximately  225,000  records. 


Year 

ISI  Total 
Citations 

Rand 

Pre-selected 

1968 

3,644,470 

22,967 

1969 

3,961,720 

38,701 

1970 

4,134,156 

48,674 

1971 

4,323,030 

52,843 

1972 

4,659,115 

62,633 

20,722,491 

225,818 

(c)  The  Rand  file  of  5937  records  and  the  225,000  records  were  sorted  into 
sequence  on  18  characters  of  the  cited  author  name  (full  name  field),  1 
character  of  the  cited  journal,  and  the  volume,  page  and  year.  A  total  of 
39,098  records  were  matched  by  this  method.  A  tape  of  all  unmatched 
records  (approximately  185,000)  was  also  created.  It  should  be  noted  that 
only  the  first  character  of  the  journal  was  used  in  a  comparison  due  to  the 
unreliability  of  spelling  cited  journals  past  the  first  character.  Such  varia¬ 
tions  of  spelling  are  to  be  found  in  the  list  which  Rand  furnished  to  ISI 
where  the  same  article  by  the  same  author  for  the  same  volume,  page  and 
year  is  given  with  a  different  cited  journal  spelling.  (For  example:  a  cited 
journal  is  spelled  as  "J  Clin  End”  and  as  "J  Clin  Endocr.”) 

(d)  The  unmatched  records  mentioned  above  were  matched  against  the  5937 
Rand  records  on  6  characters  of  the  author’s  name,  1  character  of  the 
journal  and  the  volume,  page  and  year.  By  using  this  method,  ISI  matched 
an  additional  1734  records.®  These  records  are  matched  although  there 
may  be  slight  differences  in  the  spelling  of  the  author’s  name,  usually  due 
to  a  difference  in  the  number  of  initials  given  with  the  surname.  However, 
there  are  also  variations  in  the  surname,  due  either  to  an  error  in  Rand’s 
original  list  or  in  the  spelling  of  a  reference  extracted  by  ISI  from  the 
original  source  journals. . . .  An  additional  five  citations  were  extracted  by 
visually  checking  the  Rand  list  against  potential  matches  where  the 
volume  or  page  or  both  were  blank. 

The  match  described  in  paragraph  (c)  above  was  performed  twice  for  articles 
published  as  chapters  or  sections  of  books.  In  one  pass,  the  page  number  of  the 
chapter  was  used  and  in  the  other  pass  it  was  left  blank.  This  resulted  in  retrieving 
all  citations  to  the  author  and  book  without  a  page  number  and  all  citations  to  the 
author  and  book  that  referenced  the  first  page  of  the  chapter.  However,  since  refer¬ 
ences  are  frequently  made  to  pages  of  a  book  internal  to  chapters  and  these  citations 
were  not  retrieved,  the  accuracy  of  the  citation  data  for  books  is  much  lower  than 
that  for  journal  articles. 


^  Ttiese  matches  were  obtained  because  of  the  reduction  from  18  to  6  characters  of  the  author’s  name. 


IV.  MOST-CITED  ARTICLES 


A  natural  way  to  begin  exploring  the  value  of  citations  as  a  measure  of  research 
output  is  to  examine  articles  that  were  cited  most  frequently.  I  first  describe  the 
selection  of  the  most-cited  articles  and  then  show  that  in  general  a  grant  under 
which  a  most-cited  article  was  published  was  subsequently  judged  by  a  study  section 
to  be  more  valuable  than  other  grants  in  the  sample.  The  NIH  awarding  system 
perceived  that  these  grants  were  more  likely  than  others  to  produce  exceptionally 
useful  results  when  the  funding  decision  was  made,  before  the  publication  of  these 
articles. 

Of  the  publications  produced  by  the  NIH  grants  in  the  sample,  4351  were  clas¬ 
sified  as  journal  articles,  370  as  books,  1035  as  talks  given  at  professional  meetings, 
and  27  as  theses.  Since  the  number  of  citations  is  markedly  different  for  each  of  these 
kinds  of  publications,  I  discuss  only  journal  articles  in  this  section. 

Table  9  shows  the  number  of  journal  articles  in  the  sample  by  year  of  publica¬ 
tion.  The  number  of  citations  per  article  varies  by  year  because  only  citations  that 
occurred  between  1968  and  1972  are  included. 


Table  9 

JOURNAL  ARTICLES  PUBLISHED  UNDER  NIH  GRANTS  IN  SAMPLE 


1966 

1967 

1968 

1969 

1970 

Number  of  articles 
Average  number  of 

624 

1143 

1264 

1065 

225 

citations  per 
article 

11.18 

10.41 

8.58 

7.56 

7.29 

Standard  deviation 
Maximum  number  of 

19.59 

16.07 

14.83 

12.38 

9.34 

r  i  tat  ions  per 
art icle 

190 

196 

206 

121 

66 

The  distribution  of  the  number  of  citations  per  article  is  plotted  in  Fig.  5  for 
articles  published  in  1967.  There  were  five  or  fewer  citations  for  50  percent  of  the 
articles,  but  there  is  a  very  long  tail.  The  general  shape  of  the  distribution  is  similar 
for  other  years. 

I  select  the  most-cited  five  percent  of  the  articles  published  each  year,  a  total  of 
225  articles.  The  journals  in  which  they  were  published  are  listed  in  Table  10. 

Two  of  these  articles  were  listed  in  the  Research  Grants  Index  as  having  been 
produced  under  two  separate  grants  each.  In  Table  11,  which  shows  the  number  of 
most-cited  articles  per  grant,  each  of  these  two  articles  is  double  counted.  Three  of 
these  four  grants  also  produced  at  least  one  other  article  in  the  most-cited  category, 
so  the  double  counting  is  not  important.  These  225  most-cited  articles  were  produced 
by  116  grants,  or  14.5  percent  of  the  grants  in  the  sample. 


24 


Articles 


25 


I  Citations 

95  Percentile 


Fig.  5 - Histogram  of  citations  to  1,143  journal  articles  published  in  1967 


MOST-CITED  ARTICLES  AND  JUDGMENTS  OF  SUBSEQUENT 
RENEWAL  APPLICATIONS 

I  hypothesize  that  most  of  the  116  grants  that  produced  highly  cited  articles 
were  exceptionally  useful  pieces  of  research.  I  cannot  and  do  not  claim  that  each  of 
these  grants  produced  more  useful  research  than  each  of  the  grants  not  so  cited. 
However,  I  hope  to  produce  evidence  that  the  results  of  this  set  of  grants  were 
generally  more  useful  than  the  results  of  the  average  grant  that  was  not  highly 
cited.  It  will  then  be  possible  to  discover  those  characteristics  of  grants  that  are 
likely  to  produce  exceptionally  useful  research  results.  For  a  subset  of  these  grants 
I  examine  the  hypothesis  that  the  highly  cited  grants  were  also  exceptionally  useful 
by  using  the  judgment  of  the  Initial  Review  Group  on  the  first  competing  renewal 
application  received  after  FY  1967. 

In  Tables  12-24  of  this  section  the  set  of  grants  that  had  at  least  one  article 
among  the  most-cited  5  percent  are  labeled  as  "Group  1”  and  the  remaining  grants 
as  "Group  2.”  Table  12  shows  the  number  of  grants  in  each  group  for  which  a 
competing  renewal  application  was  made  after  1967.  The  proportion  of  grants  with 
a  renewal  application  is  higher  in  Group  1  than  in  Group  2.  Table  12  also  shows  the 


26 

Table  10 


JOURNALS  THAT  PUBLISHED  THE  MOST-CITED  ARTICLES 


Journal 

Number  of 
Most-Cited 

Articles 

J.  Clinical  Investigation 

22 

Science 

14 

J.  Bilogical  Chemistry 

J.  Pharmacology  and  Experimental 

12 

Therapeutics 

10 

J.  Cell  Biology 

9 

J.  Experimental  Medicine 

9 

New  England  J.  of  Medicine 

8 

Biochima  et  Biophysica  Acta 

J.  Clinical  Endocrinology  and 

6 

Metabolism 

6 

American  J.  of  Cardiology 

5 

American  J.  of  Medicine 

4 

Circulation  Research 

4 

J.  Immunology 

4 

J.  Molecular  Biology 

4 

Lancet 

4 

Nature 

Proceedings  of  The  National  Academy 

4 

of  Science 

4 

American  J.  of  Physiology 

3 

Archives  of  Internal  Medicine 

3 

Biochemistry 

3 

Genetics 

3 

J.  American  Medical  Association 

3 

J.  Bacteriology 

3 

J.  Physiology 

3 

Molecular  Pharmacology 

3 

17  other  journals  each  had 

2 

38  other  journals  each  had 

1 

Number  of  journals  80 

Number  of  articles  225 


Table  11 

NUMBERS  OF  MOST-CITED  JOURNAL 
ARTICLES  PER  GRANT 


Most-Cited 

Articles 

Number  of 
Grants 

0 

682 

1 

66 

2 

25 

3 

13 

4 

6 

5 

2 

7 

1 

8 

1 

9 

1 

14 

1 

27 


Table  12 

NEXT  COMPETING  RENEWAL  APPLICATION  AFTER  1967— 
GROUP  1  VERSUS  GROUP  2 


Group 


Applied  for  competing  renewal 
Did  not  apply 


86 

30 

116 


1 


435 

247 

682 


Group  2 


=  4.69  [1  df] 


Result 


Withdrawn 

Deferred  and  no  later 
judgment 
Disapproved 
Approved 


0 


4 


0 

8 

78 

86 


2 

89 

340 

435 


=  6.14  [1  df] 


disposition  of  the  renewal  applications.  Of  those  applications  that  came  to  a  vote  by 
an  IRG,  20.7  percent  of  Group  2  applications  were  recommended  for  disapproval,  but 
only  9.3  percent  of  the  applications  from  Group  1  were  recommended  for  disapprov¬ 
al.  This  difference  is  significant  at  the  0.02  level. 

Table  13  compares  the  priority  scores*  received  on  the  approved  renewal  appli¬ 
cations  in  each  group.  Here  the  difference  between  applications  in  each  groups  is 
readily  apparent.  Only  10.3  percent  of  the  Group  1  applications  received  a  worse 
than  average  score,  but  42.1  percent  of  the  applications  in  Group  2  received  worse 
than  average  scores.  The  differences  between  the  means  is  significant  at  the  0.001 
level. 

A  proposal  to  renew  the  average  grant  in  Group  1  was  viewed  by  the  Initial 
Review  Groups  as  having  more  scientific  merit  than  a  similar  application  from 
Group  2.  The  leap  from  this  observation  to  a  statement  that  the  research  results  of 
the  grants  in  Group  1  were  on  the  average  more  valuable  than  the  results  of  the 
other  grants  is  open  to  two  objections.  The  first  is  that  the  correlation  between  the 
merit  of  the  follow-up  proposal  and  the  worth  of  the  preceding  work  might  be  low. 
The  second  is  that  although  the  correlation  may  be  high,  our  citation  measure  might 
not  capture  the  merit  of  the  work.  Since  the  judgments  of  the  study  sections  on  the 
renewal  application  clearly  show  that  Group  1  grants  were  to  be  preferred  to  Group 
2  grants,  to  hold  either  of  these  objections  requires  the  belief  that  this  difference  can 
be  explained  by  other  factors  that  have  not  been  considered. 

One  way  to  analyze  these  objections  is  to  consider  the  relationship  between  the 
priority  score  received  on  the  renewal  application  and  two  independent  variables: 
the  priority  score  received  in  1967,  and  having  produced  at  least  one  most-cited 
article.  The  form  of  the  relationship  is  assumed  to  be: 

y  =  ao  +  a^Xj  +  a2X2  +  e  , 

‘  In  order  to  account  for  the  difference  between  study  sections,  the  renewal  score,  s,  is  normalized  to 
(s  —  ja)/ cr  where  and  a  are  the  sample  mean  and  standard  deviation  of  all  the  priority  scores  awarded 
by  the  same  study  section  over  a  year’s  time. 


28 


Table  13 

NORMALIZED  PRIORITY  SCORE  RECEIVED  ON  FIRST  COMPETINC 
APPLICATION  RECEIVED  AFTER  1967^ 


Normalized  Score 

(;roup  1 

Percent 

Group  2 

Percent 

-1) 

25 

32.0 

49 

14.4 

(-1,  0) 

45 

57.7 

148 

43.5 

(0,  1) 

6 

7.7 

84 

24.7 

(1,  'G 

2 

2.6 

59 

17.4 

Total 

78 

100.0 

340 

100.0 

Average  score 

-0.663 

-0.015 

Standard  deviation 

0.676 

0.979 

t-statistic  =  5.55 

^Following  the  convention  of  the  unt rans formed  priority 
score,  a  numerically  lower  normalized  score  is  a  better 
score . 


where  y  =  the  score  received  on  the  subsequent  renewal  application, 

Xi  =  the  score  received  in  1967, 

X2  =  1  if  the  grant  was  in  Group  1,  and  0  otherwise. 

This  regression  examines  whether  something  happened  to  the  particular  set  of 
grants  that  produced  a  most-cited  article  to  cause  the  study  section’s  judgment  of 
a  renewal  application  to  differ  from  the  judgment  received  in  1967.  If  nothing 
happened,  we  would  expect  a2  to  be  approximately  zero.  If  a2  is  not  zero,  we  know 
something  happened  to  this  set  of  grants  and  the  simplest  explanation  is  that  they 
actually  produced  more  useful  research.  The  results  of  the  regression  are  shown  in 
Table  14.  All  the  coefficients  are  significant  at  the  0.001  level.  The  grants  in  Group 
1  received  scores  on  the  average  47  points  better  than  would  have  been  expected  on 
the  basis  of  the  scores  they  received  in  1967.  The  judgments  of  renewal  appplications 
by  the  study  sections  confirm  that,  on  the  average,  these  grants  produced  more 
useful  research  results  than  the  other  grants. 


Table  14 

REGRESSION  OF  THE  PRIORITY  SCORE 
RECEIVED  ON  THE  FIRST  COMPETING 
RENEWAL  APPLICATION  AFTER  1967 


Number  of  data  points 

418 

r2 

0.19 

aQ  (constant) 

132,19 

(1967  priority  score) 

0.460 

(7.3) 

32  (Group  1=1) 

-46.87 

(5.6) 

29 


It  is  worth  noting  that  one  could  turn  this  argument  around  by  assuming  that 
the  production  of  a  highly  cited  article  presents  an  a  priori  case  that  a  grant  was 
useful  to  the  scientific  community.  Given  this  assumption,  the  correlation  between 
the  most-cited  articles  and  the  subsequent  renewal  priority  score  is  evidence  that 
the  peer  groups  are  evaluating  scientific  output.  Since  the  production  of  a  highly 
cited  article  results  in  a  significantly  better  renewal  priority  score  even  after  one 
controls  for  the  priority  score  received  in  1967,  it  would  follow  that  part  of  the  low 
correlation  between  the  scores  received  on  a  funded  grant  and  its  subsequent  renew¬ 
al  is  due  to  an  adaptation  by  the  second  study  section  toward  the  merit  of  the 
research  and  away  from  the  earlier  appraisal  of  its  potential.  Without  choosing 
between  these  two  alternative  hypotheses,  one  can  note  that  the  existence  of  a 
strong  correlation  between  the  two  research  output  measures  boosts  the  face  validi¬ 
ty  of  each  of  them. 


CHARACTERISTICS  OF  THE  MOST-CITED  GRANTS 
Type  of  Application 

Table  15  shows  the  number  of  grants  in  each  group  that  were  from  new  and 
renewal  applications  in  1967.  Although  slightly  more  of  the  Group  1  than  of  the 
Group  2  grants  were  from  renewal  applications,  a  Chi-square  test  shows  this  differ¬ 
ence  is  not  statistically  significant.  This  indicates  exceptionally  useful  results  can 
be  found  without  long  lead  times. 


Table  15 

COMPARISON  OF  CROUP  1  AND  CROUP  2  GRANTS 
BY  TYPE  OF  APPLICATION 


Group  1 

Group  2 

Total 

Count 

Percent 

Count 

Percent 

Count 

Percent 

New 

56 

48.3 

373 

54.7 

429 

53.8 

Renewal 

60 

51.7 

309 

45.3 

369 

46.2 

Total 

116 

100.0 

682 

100.0 

798 

100.0 

=  1.59  with  1  df. 


Funding  Institutes 

Table  16  shows  the  distribution  of  funding  institutes  for  each  group.  There  is  no 
significant  difference  across  institutes  between  the  groups. 


30 


Table  16 

DISTRIBUTION  OF  FUNDING  INSTITUTES 
OF  GROUP  1  AND  GROUP  2  GRANTS 


Institute 

Group  I 

Pe  rcen  t 

Group  2 

Percent 

NIAID 

12 

10.3 

74 

10.9 

NIAMD 

36 

31.0 

163 

23.9 

NIBDS 

19 

16.4 

147 

21.6 

NCI 

8 

6.9 

57 

00 

NIDR 

0 

0.0 

2 

0.3 

NIEHS 

0 

O 

O 

5 

0.7 

NIGMS 

12 

10.3 

66 

9.7 

NICHD 

10 

00 

45 

6.6 

NHLI 

19 

16.4 

123 

18.0 

Total 

116 

100.0 

682 

100.0 

Omitting  NIDR  and  NIEHS,  “  4.33  with  6  df. 


Funding  Mechanism 

Table  17  shows  the  distribution  of  funding  mechanism  (research  project  grant 
(ROD  or  program  project  grant  ( POl ))  in  each  group.  The  program  project  grants  are 
more  heavily  represented  in  the  exceptionally  useful  category.  The  Chi-square  sta¬ 
tistic  of  17.2  with  one  degree  of  freedom  shows  this  difference  is  significant  at  the 
0.001  level.  In  terms  of  the  number  of  most-cited  articles,  the  difference  between  tbe 
mechanisms  is  even  larger. 


Table  17 

COMPARISON  OF  GROUP  1  AND  GROUP  2  GRANTS  BY  FUNDING  MECHANISM 


Number  of  Grants 

Group  1 

Group  2 

Total 

Funding  mechanism 

Count 

Percent 

Count 

Percent 

Count 

Percent 

ROl 

98 

84.5 

649 

95.2 

747 

93.6 

POl 

18 

15.5 

33 

4.8 

51 

6.4 

=  17.2 


Most-Cited  Articles 

Articles 

per  Grant 

Number 

of  Articles 

Funding  mechanism 

Average 

Standard 

Deviation 

Total 

Percent 

ROl 

1.70 

1.24 

167 

73.6 

POl 

3.33 

3.31 

60 

26.4 

Both 

1.96 

1.81 

227 

100.0 

Awards  in  1967 

Group  1 

Group  2 

Total 

Funding  mechanism 

($000) 

Percent 

($000) 

Percent 

($000) 

Percent 

ROl 

3462. 

36.5 

17,745 

77.2 

21,207 

65.3 

POl 

6017. 

63.5 

5,239 

22.8 

11,256 

34.7 

Total 

9478. 

100.0 

22,984 

100.0 

32,462 

100.0 

31 


Since  program  project  grants  are  typically  funded  at  much  higher  dollar  levels 
than  research  project  grants,  we  must  also  consider  expenditures.  Table  17  shows 
the  total  dollars  awarded  in  1967  in  each  category.  Although  program  project  grants 
constituted  only  6.4  percent  of  the  grants  in  the  sample,  they  received  34.7  percent 
of  the  dollars  awarded  to  the  sample.  Thus,  the  percentage  of  expenditure  on  this 
type  of  grant  is  close  to  the  26.4  percent  of  exceptionally  useful  articles  produced 
through  it. 

This  simple  analysis  shows  little  difference  between  the  two  funding  mech¬ 
anisms.  However,  the  choice  of  the  single  cutoff  measure  used  to  separate  articles 
into  two  categories  is  arbitrary;  the  question  of  the  difference  between  program 
project  and  research  project  grants  is  explored  more  fully  in  Section  VI,  where  a 
continuous  output  measure  is  used. 

1967  Funding  Decision 

An  interesting  question  to  examine  is  whether  these  grants  were  perceived  to 
be  potentially  more  important  in  1967  before  the  articles  were  published.  The 
awarding  process  allows  us  three  dimensions  on  which  to  evaluate  the  perceived 
importance  of  the  work.  One  would  expect  that  on  the  average  the  better  grants 
would  receive  a  larger  dollar  amount  per  year,  be  funded  for  a  longer  period  of  time, 
and  receive  a  better  priority  score.  Because  program  project  grants  differ  from 
research  project  grants  on  these  dimensions  and  form  a  larger  than  expected  propor¬ 
tion  of  Group  1  grants,  one  must  control  for  funding  mechanisms  before  making 
comparisons. 

The  data  comparing  the  two  groups  of  grants  on  these  dimensions  are  shown  in 
Table  18.  The  differences  between  the  groups  of  research  project  grants  in  the 
number  of  years  awarded,  size  of  the  award,  and  priority  scores  are  all  significant 
at  the  0.001  level  using  a  two-tailed  test.  For  the  groups  of  program  project  awards, 
only  the  difference  in  the  size  of  the  award  is  significant  at  the  0.01  level. 

The  existence  of  differences  in  size  of  award  and  years  awarded  may  have 
contributed  to  the  difference  in  research  output.  However,  the  priority  score  award¬ 
ed  should  not  affect  research  output,  and  thus  this  difference  can  be  completely 
attributed  to  the  power  of  the  study  sections  to  perceive,  before  the  research  is 
funded,  which  grants  are  more  likely  to  be  exceptionally  useful. 

Differences  Between  Application  and  1967  Award 

Both  the  dollar  amount  of  the  award  and  the  length  of  time  for  which  support 
is  committed  may  be  determined  to  a  large  extent  by  the  scientific  nature  of  the 
project.  The  cost  of  necessary  equipment  and  supplies  will  vary  by  the  type  of 
research  as  will  the  length  of  time  required  to  bring  the  research  to  fruition.  How¬ 
ever,  there  is  frequently  a  difference  between  what  is  requested  in  the  application 
and  what  is  awarded.  It  is  reasonable  to  believe  that  the  request  reflects  the  scientific 
nature  of  the  project  and  that  differences  between  the  request  and  the  award  reflect 
an  attempt  to  limit  the  risk  of  expending  scarce  NIH  resources  on  projects  with 
uncertain  outcomes.  For  example,  a  project  that  would  take  five  years  to  exhaust 
all  scientific  avenues  may  be  funded  for  only  three  years.  This  could  be  done  on  the 
assumption  that  if  the  project  is  completely  successful  in  its  first  years,  it  will  return 


32 


Table  18 

COMPARISON  OF  CHARACTERISTICS  OF  1967  FUNDING  DECISION 
FOR  GROUP  I  AND  GROUP  2  GRANTS 


Group  1 

Group  2 

Average 

Standard 

Deviation 

Average 

Standard 

Deviat ion 

t-Statistic. 

Number  of  years 
awarded 

ROl  grants 

4.12 

1.13 

3.18 

1.12 

7.76 

POl  grants 

4.28 

1.87 

4.03 

1.65 

0.49 

A1 1 

4.15 

1.27 

3.22 

1.16 

7.82 

Award  in  1967 
($000) 

ROl  grants 

)5. 1 

18.3 

27.3 

21.0 

3.57 

POl  grants 

334.3 

369.  1 

158.8 

100.0 

2.58 

A1  1 

81 . 7 

179.0 

33.7 

41.4 

6.11 

I’riority  score 

ROl  grants 

201.6 

54.2 

225.0 

54.0 

4.00 

POl  grants 

211.2 

58.5 

217.9 

65.0 

0.35 

All 

203.0 

54.7 

224.7 

54.5 

3.98 

with  a  renewal  application  and  be  funded  for  completion;  if  it  is  less  successful,  the 
money  that  would  have  supported  it  in  the  last  two  years  can  be  diverted  to  more 
useful  purposes.  Similarly,  a  project  may  be  supported  at  a  lower  dollar  level  than 
requested  either  because  it  is  somewhat  uncertain  and  funding  it  at  a  lower  level 
will  allow  more  projects  to  be  supported,  or  because  only  part  of  it  is  considered 
worthwhile.  In  these  cases,  the  decision  to  grant  the  award  under  terms  different 
from  the  request  reflects  an  assessment  of  either  the  expected  value  of  the  award 
or  the  risk  inherent  in  the  project. 

Differences  in  Numbers  of  Years  Awarded  and  Requested 

The  grants  in  Group  1  were  funded  for  a  longer  period  of  time  than  the  other 
grants.  Since  grants  are  not  awarded  for  more  years  than  are  requested  by  the 
applicant,  it  is  of  interest  to  determine  whether  the  longer  period  of  the  award  is 
entirely  because  Group  1  grants  requested  longer  time  periods,  or  if  they  were  less 
likely  than  other  grants  to  be  funded  for  a  shorter  period  of  time  than  requested. 

Figure  6  shows  the  distributions  of  the  number  of  years  requested  and  awarded 
for  all  the  ROl  grants  in  our  sample.  Although  85  grant  applications  were  for  more 
than  five  years,only  seven  of  the  grants  ( or  percent  of  all  grants)  received  an  award 
for  over  five  years.  I  assume  that  an  award  for  five  years  means  that  the  study  section 
had  a  good  deal  of  confidence  in  the  productivity  level  of  the  grant.  I  define  a  cut 
in  the  years  of  support  to  mean  that  a  grant  was  funded  for  a  shorter  time  than 
requested,  and  in  addition  fewer  than  five  years  of  funding  were  committed.  The 
chance  of  being  cut  varies  with  the  number  of  years  of  support  requested.  Although 


Number  of  grants 


33 


Years 


Fig.  6 - Histogram  of  years  requested  and  awarded  (ROl  grants) 


32.4  percent  of  all  ROl  grants  were  cut,  of  those  requesting  three  or  fewer  years  of 
support,  fewer  than  10  percent  were  cut,  and  of  those  requesting  four  or  more  years 
of  support  about  50  percent  were  cut. 

In  Table  19,  I  compare  the  actions  of  the  study  section  in  this  regard  on  appli¬ 
cations  in  Group  1  (the  most-cited  category)  and  Group  2,  while  controlling  for 
number  of  years  requested.  There  are  differences  between  the  two  groups  only  in  the 
set  of  ROl  grants  for  which  more  than  three  years  of  support  were  requested.  In  this 
category  only  26.9  percent  of  the  Group  1  grants  were  cut,  but  58.1  percent  of  the 
other  grants  were  cut.  A  Chi-square  test  shows  this  difference  is  significant  at  the 
0.001  level. 

Difference  in  Dollars  Awarded  and  Requested 

Another  way  the  grants  are  evaluated  is  in  terms  of  the  amount  of  money 
awarded  each  year.  For  ROl  grants  the  average  amount  requested  in  1967  was 
$33,800,  and  the  average  amount  actually  awarded  in  direct  costs  was  $28,400. 
Approximately  22  percent  of  the  grants  were  for  less  than  80  percent  of  the  dollar 
amount  requested.  The  fraction  funded  is  related  inversely  to  the  amount  requested. 
The  correlation  coefficient  between  the  fraction  awarded  and  the  amount  requested 
is  —0.32.  Table  20  shows  that  the  ROl  grants  in  Group  1  requested  significantly 
larger  amounts  than  those  in  Group  2,  but  the  average  fraction  of  the  amount 
requested  that  was  actually  received  did  not  differ. 


34 


Table  19 

COMPARISON  OF  LENGTH  OF  AWARD  PERIODS  OF  GROUP  1 
AND  GROUP  2  GRANTS 


Group  1 

Percent 

Group  2 

Percent 

Total 

Percent 

ROl  requested  ^  3  years 

Not  cut^ 

18 

90.0 

288 

92.9 

306 

92.7 

Cut 

2 

10.0 

22 

7.1 

24 

7.3 

ROl  requested  ^  4  years 

Not  cut 

57 

73.1 

142 

41.9 

199 

47.7 

Cut 

21 

26.9 

197 

58.1 

218 

52.3 

POl  requested  ^  3  years 

Not  cut 

2 

100.0 

8 

88.9 

10 

90.9 

Cut 

0 

0.0 

1 

11.1 

1 

9.1 

POl  requested  ^  4  years 

Not  cut 

10 

62.5 

16 

66.7 

26 

65.0 

Cut 

6 

37.5 

8 

33.3 

14 

35.0 

^Received  five  years  or  number  of  years  requested. 


Table  20 

COMPARISON  OF  CROUPS  BY  SIZE  OF  AWARD  IN  FY  1967 


Group  1 

Group  2 

t-Stat ist Ic 

ROl  Grants 

Number 

Requested  ($000) 

98 

649 

Average 

43.5 

31.9 

4.03 

Standard  deviation 

26.6 

25.0 

Fraction  of  request 

Awarded,  average 

0.870 

0.892 

1.05 

Standard  deviation 

0.199 

0.32 

POl  Grants 

Number 

Requested  ($000) 

18 

33 

Average 

445.3 

225.7 

1.98 

Standard  deviation 

457.5 

159.6 

Fraction  of  request 

Awarded,  average 

0.781 

0.783 

0.03 

Standard  deviation 

0.224 

0.212 

35 


Probability  that  a  Grant  Will  Be  Exceptionally  Useful 

1  next  consider  the  joint  relationship  between  all  of  the  characteristics  of  the 
decision  made  in  1967  and  the  probability  that  a  grant  will  be  exceptionally  useful. 
The  study  section  reviewing  the  subsequent  renewal  applications  decided  to  disap¬ 
prove  eight  of  the  grants  in  Group  1.  The  subsequent  renewal  applications  for  an 
additional  eight  grants  received  below  average  priority  scores.  Since  the  value  of 
these  16  grants  is  open  to  some  question,  they  have  been  omitted  from  the  sample 
for  this  portion  of  the  analysis,  and  1  consider  as  exceptionally  useful  only  the 
remaining  100  grants  that  produced  at  least  one  article  among  the  most-cited  5 
percent.  This  group  comprises  80  ROl  grants  and  17  POl  grants. 

I  use  a  logit  model  to  estimate  the  probability  that  a  grant  will  be  exceptionally 
useful  by  this  definition.  The  explanatory  variables  cover  all  aspects  of  the  1967 
funding  decision  and  are  defined  in  Table  21.  Using  the  logarithm  of  the  dollar 
amount  awarded  resulted  in  a  significantly  better  fit  to  the  data  than  was  obtained 
on  other  regressions  using  the  untransformed  dollar  amount. 


Table  21 

DEFINITIONS  OF  EXPLANATORY  VARIABLES  FOR  THE  PROBABILITY 
THAT  A  GRANT  WILL  BE  EXCEPTIONALLY  USEFUL 


X^  Priority  score 

X2  Number  of  years  for  which  support  is  committed 

X^  Natural  logarithm  of  the  dollar  amount  awarded  from  FY  1967  funds 

X^  Type  of  application,  X^  =  0  if  1967  application  was  new;  X^  =  1 

if  it  was  a  renewal 

X^  Fraction  of  the  dollar  amount  requested  in  1967  that  was  awarded 

X^  X^  =  1  if  the  application  requested  three  or  fewer  years  of  sup¬ 
port  and  if  fewer  years  of  support  were  committed  than  requested; 
X^  =  0  otherwise 

X^  X7  =  1  if  the  application  requested  four  or  more  years  of  support, 
was  funded  for  less  than  five  years,  and  for  fewer  years  than 
requested;  Xy  =  0  otherwise 

Xg  Xg  =  1  if  a  program  project  grant,  Xg  =  0  for  research  program 
grants 


Since  program  project  grants  are  funded  at  much  higher  dollar  levels  than 
research  project  grants,  the  program  project  grant  effect  is  inseparable  from  the 
effect  of  the  size  of  the  award.  Therefore,  even  though  the  dummy  variable  for 
program  project  grants  (Xg)  is  not  significantly  different  from  zero  in  the  first  regres¬ 
sion  in  Table  22,  that  does  not  mean  that  program  projects  are  not  different  from 
research  projects  in  the  probability  of  being  exceptionally  useful;  it  means  only  that 
if  there  is  a  difference,  it  is  captured  through  the  coefficient  of  X3,  the  size  of  the 
award.  The  magnitude  of  the  difference  between  funding  mechanisms  in  the  proba¬ 
bility  that  a  grant  will  be  exceptionally  useful  is  shown  in  Table  23.  The  equation 


36 


Tabie  22 

ESTIMATION  OF  THE  LOGIT  FUNCTION  OF  THE  PROBABILITY 
THAT  A  GRANT  WILL  BE  EXCEPTIONALLY  USEFUL  AS 
A  FUNCTION  OF  AWARDING  CHARACTERI STICS^ 


Estimate  1 

Estimate  2 

Estimate  3 

Number  of  data  points 
Chi-square  statistic 

777 

94.1  8  df. 

731 

67.8  7  df. 

731 

64.1  4  df . 

Variable 

Xj  (priority  score) 

-0.00722^^ 

-0.00705^ 

-0.00617^ 

X2  (years  funded) 

O.443C' 

0.573® 

0.545'' 

X^  (size  of  award) 

0.615^ 

0.452^^ 

0.448*^ 

X,  (renewal) 

4 

0.270 

0.146 

0.169 

X^  (fraction  awarded) 

-1.196® 

-1 . 382* 

X  (see  Table  25) 

D 

0.245 

0.473 

X^  (see  Table  25) 

0.086 

-0.004 

X  (program  project) 

0 

-0.154 

Constant 

-7.506 

-6.123 

NOTE:  Coefficients  derived  by  maximum  likelihood  technique. 

^Significant  levels  are  derived  using  asymptotic  t-test. 
^Significant  at  the  O.OI  level. 

'^Significant  at  the  0.001  level. 

*^Slgni  f  leant  at  the  0.05  level. 

Significance  level  =  0.07. 

^Significance  level  =  0.06. 


Table  23 

ESTIMATES  OF  THE  PROBABILITY  THAT  A  GRANT 
WILL  BE  EXCEPTIONALLY  USEFUL 


Priority  Score 

ROl  Grant 

POl  Grant 

100 

0.15 

0.36 

150 

0.11 

0.28 

200 

0.08 

0.21 

250 

0.06 

0.16 

of  Estimate  1  in  Table  22  is  evaluated  at  the  average  dollar  amount  awarded  in  each 
mechanism,  assuming  an  award  for  three  years. 

Because  the  program  project  grant  effect  may  distort  the  effect  of  the  size  of 
award,  I  have  also  run  a  similar  regression  restricting  the  data  to  research  project 
grants.  The  results  are  listed  as  Estimate  2  in  Table  22.  In  both  of  these  regressions 


37 


in  the  priority  score,  number  of  years  awarded  and  size  of  the  award  are  significant 
and  have  the  appropriate  sign.  There  is  no  difference  between  new  and  renewal 
grants.  The  two  dummy  variables  identifying  grants  that  receive  support  for  fewer 
years  than  are  requested  are  not  significant,  probably  because  their  effect  is  cap¬ 
tured  in  the  number  of  years  of  support  committed.  The  fraction  funded  of  the  dollar 
amount  requested,  although  only  marginally  significant,  has  a  sign  opposite  to  what 
one  would  expect  if  a  cut  in  the  award  level  signified  that  the  value  of  the  grant  was 
less  certain.  Having  these  variables  in  the  equation  does  not  affect  the  values  of  the 
other  coefficients  as  is  seen  in  Estimate  3  in  Table  22,  where  these  variables  were 
dropped  from  the  regression  equation  for  research  project  grants. 

The  effect  of  priority  score  on  the  probability  that  a  research  project  grant  will 
be  exceptionally  useful  by  this  definition  is  plotted  in  Fig.  7.  The  plots  use  Estimate 
3  from  Table  22,  assume  an  award  level  of  $30,000  per  year,  and  show  the  difference 
between  a  grant  deemed  worthy  of  a  commitment  of  five  years  of  support  and  one 
that  rates  only  three  years. 


Fig.  7 - Estimates  of  the  probability  that  a  grant  will  be 

“exceptionally  useful” 


V.  A  CITATION  MEASURE  OF  RESEARCH  OUTPUT 


The  use  of  most-cited  articles  results  in  only  two  levels  of  research  output.  In 
this  section  I  use  finer  research  output  measures,  the  average  number  of  citations 
per  publication  or  the  total  number  of  citations  per  grant.  The  citation  data  must 
be  adjusted  to  account  for  the  time  pattern  in  which  citations  occur  and  the  size  of 
the  scientific  field  for  which  the  grant  was  awarded.  This  section  describes  the  way 
these  problems  were  dealt  with  and  the  selection  of  an  output  measure  from  the 
variety  of  possibilities  available  in  the  data.  I  then  present  a  model  of  the  determi¬ 
nants  of  research  output  and  finally  discuss  the  relationship  between  the  character¬ 
istics  of  the  1967  decision  to  fund  these  grants  and  the  chosen  citation  measure  of 
research  output. 


TIME  PATTERN  OF  CITATIONS 

The  data  retrieved  gave  the  number  of  citations  that  occurred  in  1968-1972  for 
a  series  of  publications  appearing  in  1966  to  1970.  Since  the  number  of  citations  per 
year  tends  to  rise  through  the  first  2  to  4  years  following  publication  and  then 
decline,  the  available  data  cannot  be  used  without  controlling  for  the  year  of  publica¬ 
tion.  While  I  was  developing  a  method  for  doing  this,  I  was  able  to  generate  a  great 
deal  of  information  on  the  time  pattern  of  citations.  This  information  is  contained 
in  Appendix  A  along  with  the  more  technical  details  of  the  model  used  to  adjust  the 
citation  data  to  account  for  year  of  publication.  It  is  shown  there  that  citations  to 
the  most  frequently  cited  articles  occur  later  in  time  and  that  there  are  only  small 
differences  among  the  time  patterns  of  citations  to  articles  in  different  scientific 
fields.  Although  type  of  publication  has  an  effect  on  the  time  pattern  of  citations, 
the  number  of  nonjournal  articles  is  too  small  to  capture  that  effect  in  this  model. 

To  adjust  the  citations  to  each  article  retrieved  by  ISI  to  account  for  the  year 
of  publication,  I  use  the  available  data  to  estimate  the  number  of  citations  that  will 
occur  (or  have  occurred)  in  the  year  of  publication  and  the  six  following  years.  Define 
Z,  to  be  the  number  of  citations  to  an  article  occurring  i  years  after  publication. 
Then,  for  an  article  published  in  1966,  the  data  retrieved  by  ISI  gives  Z2,  Z3,  Z4,  Z5, 
Ze;  for  an  article  published  in  1968,  Zo,  Zi,  Z2,  Z3,  Z4.  For  each  article  I  wish  to  estimate 
T,  the  total  number  of  citations,  where: 


6 

T  =  2  Z.  . 

i=0  ' 

The  model  (described  in  detail  in  Appendix  A)  specifies  an  equation  for  an 
unknown  Z,  in  terms  of  a  known  set  of  {Zj}.  The  predictive  ability  of  some  of  these 
equations  is  shown  to  be  approximately  as  good  as  the  fit  to  the  regression  line, 
because  of  the  large  number  of  data  points  used  in  each  regression. 

For  each  publication  I  then  estimate  T  as  the  sum  of  the  available  Zi,  plus  the 
estimates  derived  from  the  regression  model.  This  T  contains  within  it  an  error 


38 


39 


whose  size  can  be  estimated  in  order  to  determine  how  useful  the  values  of  T  can 
be  as  the  measure  of  research  output.  If  one  assumes  that  six  years  of  citation  data 
are  enough  to  evaluate  the  short  term  effects  of  research,  then  the  estimates  of  the 
error  in  T  show  the  loss  incurred  in  obtaining  fewer  than  six  years  of  data.  The 
estimates  of  the  standard  deviation  of  the  error  in  the  estimate  of  T  for  each  year’s 
publications  are  listed  in  Table  24.  The  error  in  the  estimate  for  articles  published 
in  1967  is  smaller  than  for  articles  published  in  1966.  One  can  predict  the  number 
of  citations  seen  in  the  sixth  year  following  publication  more  accurately  than  in  the 
first  year.  It  is  clear  from  a  comparison  of  the  standard  error  of  the  prediction  for 
1970  articles  with  its  standard  deviation  that  the  prediction  of  the  total  number  of 
citations  of  articles  published  that  year  is  very  poor. 


Tabic  24 

ERRt)R  IN  THE  ESTIMATE  OF  TOTAL  NUMBER 
OF  CITATIONS  PER  ARTICLE 


Year  of  Publication 

1966 

1967 

1968 

1969 

1970 

Standard  error  of  estimate 

2.88 

2.19 

4.24 

10.66 

21.76 

Average 

]  3.28 

12.39 

12.08 

14.05 

Standard  deviation 

Data  used 

22.30 

19.00 

21.68 

23.97 

Years 

2-6 

1-5 

0-4 

0-3 

0-2 

Average 

11.18 

10.41 

8.58 

7.56 

7.29 

Standard  deviation 

19.59 

16.07 

14.83 

12.38 

9.3 

Number  of  publications 

624 

1143 

1264 

1065 

255 

W 

0.979 

0.988 

0.955 

0.716 

-0.18 

Let  cr-r  be  the  actual  standard  deviation  of  T,  the  total  number  of  citations  per 
article  occurring  in  years  0-6.  Consider  the  measure 


where  cte  is  the  variance  in  the  error  of  an  estimate  of  T.  W  measures  the  percentage 
of  the  variance  in  T  captured  by  the  available  data  and  this  model.  It  is  thus  a 
measure  of  the  percentage  of  total  information  about  each  article  gained  through 
the  sample  and  the  model.  If  I  had  gathered  data  for  each  year  of  citations  I  would 
know  T  exactly,  and  cte  =  0  and  W  =  1.  If  I  had  no  data  on  citations  to  each  article, 
but  knew  exactly  the  mean  value  of  T,  then  W  =  0.  The  last  line  in  Table  24  gives 
the  value  of  W  for  each  year’s  data  using  the  approximately  correct  value  of  ctt  = 
20.  An  interesting  observation  is  that  95  percent  of  the  information  about  citations 
occurring  in  years  0-6  is  available  in  citations  occurring  in  years  0-4.  Thus,  if  seven 
years  of  citation  data  are  enough  to  measure  the  short  term  use  of  research,  then 


40 


five  years  of  data  are  also  enoughs  As  can  be  seen  from  W  for  the  articles  published 
in  1969,  the  absence  of  Z4  leads  to  a  significant  loss  of  information.  For  the  1970 
articles  we  have  only  the  number  of  citations  seen  in  years  0-2,  and  they  yield  very 
little  information  about  total  citations  seen  in  years  0-6.  One  would  be  better  off 
using  the  actual  mean  citation  rate  than  the  estimates  of  this  model  for  citations 
per  article.  Because  of  the  usefulness  of  the  model  for  predicting  total  citations  for 
the  other  years’  publications,  I  believe  the  basic  problem  lies  in  the  lack  of  informa¬ 
tion  contained  in  Zo,  Zi,  and  Z2  rather  than  in  the  model.  Citations  during  the  first 
three  years  after  publication  do  not  indicate  the  future  importance  of  an  article. 
Therefore,  instead  of  searching  for  a  better  prediction  of  citations  from  the  available 
data,  I  chose  to  remove  the  1970  publications  from  the  sample.  All  the  succeeding 
analysis  is  based  solely  on  publications  occurring  in  1969  or  earlier. 

When  one  adds  publication  delays  to  the  length  of  time  that  must  elapse  follow¬ 
ing  publication,  it  appears  that  counts  of  citations  can  be  used  only  to  evaluate 
research  that  is  at  least  five  or  six  years  old.  Although  this  delay  is  acceptable  in 
an  examination  of  broad  questions  of  management  policy,  it  is  too  long  for  citations 
to  be  used  for  evaluating  requests  for  renewal  of  existing  grants. 


SIZE  OF  SCIENTIFIC  FIELD 

Before  I  examine  the  use  of  citations  as  a  measure  of  research  output,  it  is 
necessary  to  control  for  the  size  of  the  scientific  field  of  research  grants.  In  a  field 
that  contains  only  a  small  number  of  researchers,  the  number  of  citations  per  article 
might  be  smaller  than  in  a  large  field.  Therefore  the  grants  should  be  grouped  into 
large  sets  such  that  each  contains  all  the  grants  in  scientific  fields  of  approximately 
the  same  size. 

As  a  starting  point  I  used  the  principal  investigator’s  department  in  the  medical 
school  as  a  surrogate  for  scientific  field  and  then  examined  the  average  citation  rate 
of  articles  by  department.  Within  the  basic  science  departments  only.  Anatomy 
appeared  to  differ  from  the  rest.  Within  the  clinical  sciences  the  small  fields  of 
forensic  medicine,  physical  medicine,  public  health,  and  radiology  all  have  very 
small  citation  rates  and  are  treated  separately.  Psychiatry  is  also  placed  with  this 
group.  The  two  largest  departments,  surgery  and  medicine,  are  different  from  each 
other  and  form  the  nuclei  for  two  additional  groups.  The  definitions  of  groups  used 
here  are  displayed  in  Table  25  with  the  average  citation  rate  for  journal  articles  in 
each  group.  Further  details  on  the  method  used  to  produce  these  groups  is  found  in 
Appendix  B. 


A  MEASURE  OF  RESEARCH  OUTPUT 

For  each  grant  I  have  a  variety  of  information  concerning  the  publications. 
Table  26  lists  the  average  number  of  publications  of  each  kind  per  grant  and  the 
average  citation  rate.  These  data  allow  one  to  consider  many  possible  output  meas¬ 
ures;  number  of  publications,  citations,  and  average  citation  rates  of  each  kind  of 


'  If  only  five  years  of  data  are  available,  citations  occurring  in  the  last  year  or  two  should  be  weighted 
more  heavily,  as  can  be  seen  from  the  coefficients  of  the  regression  model  presented  in  Table  A-6. 


41 


Table  25 

GROUPINGS  OF  SCIENTIFIC  FIELDS 


Group 

Number  of 

Grants 

Citation 

Rate 

Component  Fields 

Anatomy 

51 

7.62 

Anatomy 

Other  basic  science 

240 

16.22 

Biochemistry,  biophysics, 
microbiology,  and 
physiology 

Medical  group 

333 

13.74 

Medicine,  obstetrics  and 
gynecology,  pathology, 
and  pediatrics 

Surgical  group 

124 

8.45 

Surgery  and  anesthesiology 

Miscellaneous  clinical 

50 

4.61 

Forensic  medicine,  physical 
and  rehabilitation  medi¬ 
cine,  psychiatry,  public 
health,  and  radiology 

Table  2b 

PUBLICATIONS  AND  CITATIONS  BY  TYPE  OF  PUBLICATION  AND  FUNDING  MECHANISM 


Journal 

Articles 

Books 

Talks  and 

Abstracts 

Theses 

All 

Publications 

ROI 

POi 

ROI 

POI 

ROI 

POI 

ROI 

POI 

ROI 

POI 

Number  of  publications 

3172 

998 

224 

82 

800 

190 

20 

6 

4216 

1276 

Publications  per  grant 

Average 

4.25 

19.57 

0.30 

1. 61 

1.07 

3.73 

0.03 

0.12 

5.64 

25.02 

Standard  deviation 

5.31 

25.02 

0.81 

2.49 

1.88 

4.85 

0.22 

0.48 

6.72 

30.69 

Citations  per  publication 

Average 

13.03 

12.34 

4.10 

3.00 

1. 16 

1. 19 

1.45 

1.07 

10.21 

9.96 

Standard  deviation 

21.66 

21.89 

7.25 

6.68 

2.94 

3.50 

2.81 

1. 51 

19.46 

19.86 

Total  citations  per  grant 

Average 

55.20 

239.83 

1.22 

4.80 

1.24 

4.39 

0.04 

0.125 

57.71 

249.15 

Standard  deviation 

106.58 

379.35 

5.36 

II. 41 

3.86 

11.63 

0.67 

0.62 

no. 31 

391.67 

publication  are  obvious  candidates.  Questions  naturally  arise:  Are  publications 
counts  of  additional  value  if  one  has  citations?  How  much  is  a  citation  of  a  book 
worth  compared  with  a  citation  of  a  journal  article? 

The  citation  rate  for  journal  articles  that  are  reviews  of  the  literature  is  higher 
than  the  citation  rate  for  most  articles  reporting  the  results  of  original  research. 
This  has  been  reported  in  other  studies^  and  verified  for  a  subset  of  the  articles  in 
this  study.  I  shall  examine  whether  review  articles  should  be  considered  separately, 
because  of  their  generally  higher  citation  rate. 

Because  principal  investigators  may  differ  in  their  choices  concerning  the  man¬ 
ner  of  presenting  research  results,  other  output  measures  should  be  considered.  For 
example,  an  investigator  may  choose  to  publish  many  marginally  significant  pieces 
of  work  in  addition  to  his  major  work.  If  one  uses  only  average  citation  rates  per 


See,  e.g.,  Inhaber,  1974. 


42 


article,  the  output  of  such  an  investigator  will  appear  to  be  less  than  the  output  of 
an  investigator  who  publishes  only  the  major  pieces  of  his  work.  Using  total  num¬ 
bers  of  citations  favors  an  investigator  who  publishes  his  work  piecemeal,  since  the 
researcher  citing  the  work  is  likely  to  reference  several  pieces.  One  way  to  compro¬ 
mise  between  these  two  extremes  is  to  use  the  average  citation  rate  to  the  most  cited, 
say,  50  percent  of  the  publications  from  each  grant;  but  many  other  possibilities 
could  be  suggested. 

To  decide  on  an  output  measure,  one  can  assume  that  the  judgment  of  the  study 
sections  on  renewal  applications  represents  an  assessment  of  the  quality  of  the 
research  performed  during  the  preceding  grant  period.  Then  for  a  subset  of  the 
grants  for  which  a  renewal  was  submitted  one  can  regress  the  score  received  on  the 
renewal  application  on  possible  output  measures  and  choose  one  or  more  of  those 
that  have  a  high  explanatory  power.  These  output  measures  can  then  be  applied  to 
the  entire  set  of  grants  to  examine  questions  of  policy  interest. 

Because  the  second  score  should  be  an  assessment  of  the  quality  of  the  research 
represented  in  the  publications  being  cited,  I  consider  only  second  applications  that 
were  reviewed  in  1970  or  later.  Both  the  level  of  the  award  and  publication  counts 
are  significantly  greater  for  program  projects;  I  therefore  use  only  second  appli¬ 
cations  for  research  project  grants  to  develop  an  output  measure.  This  measure  will 
later  be  used  to  compare  research  and  program  project  grants. 

A  dummy  variable  that  is  one  if  the  application  in  1967  was  for  a  renewal  grant 
and  zero  otherwise  is  used  as  a  control  because  some  new  applications  might  require 
a  period  of  time  before  any  results  could  be  published;  and  therefore  new  grants 
could  be  expected  to  have  a  smaller  publication  rate  than  renewal  grants.  The 
annual  dollar  level  of  the  award  was  used  to  control  for  the  magnitude  of  resources 
expended  by  each  grant.  In  later  work  expenditure  data  for  personnel  and  equip¬ 
ment  will  be  available  and  should  provide  more  effective  control  for  actual  resources 
expended.  To  allow  each  group  or  scientific  field  to  have  its  own  output  measure, 
regressions  should  be  run  separately  for  each  scientific  field.  This  can  be  done  for 
the  basic  science  and  medical  groups;  however,  the  remaining  three  groups  have  too 
few  grants  with  acceptable  renewal  applications  to  be  treated  separately.  Since 
these  three  groups  are  also  those  with  the  smallest  average  number  of  citations,  this 
part  of  the  analysis  will  treat  them  together. 

The  simplest  model  to  consider  allows  separate  variables  for  counts  and  average 
citation  rates  for  each  kind  of  publication.  The  few  theses  are  combined  with  talks 
and  abstracts.  The  counts  of  publications  exclude  publications  retrieved  from  the 
Research  Grants  Indexfor  FY  1970  because  these  publications  were  available  only 
for  grants  funded  in  FY  1970  and  thus  were  the  result  of  a  good  priority  score  on 
the  renewal  application,  rather  than  a  cause. 

The  results  of  these  regressions  are  displayed  in  Table  27.  In  the  combined  group 
of  anatomy,  surgery,  and  miscellaneous  clinical  sciences,  there  is  no  significant 
relationship  between  any  of  the  variables  and  the  renewal  priority  score.  In  the  basic 
science  and  medical  groups,  the  only  output  variable  that  is  significant  is  average 
citation  rate  of  journal  articles.  The  dollar  level  of  the  award  is  negatively  correlated 
with  citation  rates  in  both  groups  and  statistically  significant  in  the  medical  group. 

Many  other  forms  of  output  measures  (e.g.,  total  number  of  citations,  maximum 
citations  of  a  single  publication)  were  regressed  in  a  similar  fashion  with  the  second 
renewal  priority  score.  Rather  than  present  all  of  these  regressions,  I  summarize  the 
conclusions  from  these  runs. 


43 


Table  27 

SECOND  PRIORITY  SCORE  AS  A  FUNCTION 
OF  PUBLICATIONS  AND  CITATIONS 


Scientific 

Field  Group 

Basic 

Science 

Medical 

Other 

Number  of  grants 

117 

144 

73 

r2 

0.22 

0.17 

0.03 

Constant  term 

257.7 

271.2 

237.0 

Journal  articles,  count 

-0.277 

-0.461 

0.268 

Journal  articles. 

(O.I) 

(O.I) 

(O.I) 

citation  rate 

-1.892^ 

-2.100^ 

-0.514 

(3.9) 

(4.5) 

(0.4) 

Books,  count 

-3.014 

7.910 

-6.687 

(0.2) 

(I. I) 

(0.6) 

Books,  citation  rate 

-I. 170 

I. 231 

-1.534 

(0.4) 

(0.4) 

(0.4) 

Talks,  theses,  count 

0.333 

0.420 

2.023 

Talks,  theses,  citation 

(O.I) 

(O.I) 

(O.I) 

rate 

-3.743 

3.104 

5.327 

(1.3) 

(I. I) 

(0.5) 

Renewal 

5.490 

3.226 

14.314 

(0.4) 

(O.I) 

(0.6) 

Annual  award  ($000) 

-0.663 

-0.775b 

-0.180 

(1.3) 

(2.2) 

(0.6) 

^Significant  at  the  0.001  level. 


^Significant  at  the  0.05  level. 


No  output  measure  was  statistically  significantly  correlated  with  the  second 
priority  score  for  the  group  of  grants  from  anatomy,  surgery,  and  the  smaller  clinical 
departments.  This  may  be  for  any  one  of  a  number  of  reasons:  the  sample  size  is 
small,  the  scientific  fields  in  the  group  are  diverse,  research  quality  may  actually  not 
be  expressed  in  either  publications  or  citations,  or  the  second  priority  score  may  not 
accurately  reflect  the  quality  of  the  preceding  research.  A  means  of  choosing  among 
these  competing  possibilities  is  not  readily  apparent. 

In  the  basic  science  and  medical  groups,  the  strength  of  the  relationship  between 
the  second  score  and  the  output  measures  is  not  very  sensitive  to  the  exact  choice 
of  output  measures,  although  average  citation  rates  performed  better  than  total 
citations;  and  citations  of  journal  articles  appear  to  be  more  important  than  cita¬ 
tions  of  other  publications.  In  none  of  the  cases  that  included  a  citation  output 
measure  was  there  any  remaining  relationship  between  second  score  and  any  publi¬ 
cation  count.  Therefore  I  conclude  that  numbers  of  publications  are  not  an  addition¬ 
al  measure  of  research  quality  after  citations  have  been  included. 

The  control  variable  on  the  type  of  1967  application  was  never  significant.  The 
annual  dollar  level  of  the  award  was  significant  in  a  majority  of  the  cases  and  always 
negative.  This  is  contrary  to  the  expectation  that  the  second  score  would  be  worse 
if  more  resources  were  required  to  produce  the  same  output.  It  may  be  that  citations 


44 


measure  only  part  of  the  quality  judgment  of  the  study  section  and  that  larger  dollar 
awards  are  given  to  better  research  grants. 

As  part  of  Index  Medicus,  the  National  Library  of  Medicine  annually  publishes 
a  "Bibliography  of  Medical  Reviews,”  indexed  by  primary  author,  which  can  be  used 
to  designate  articles  in  the  sample  as  primarily  reviews  of  the  literature.  Because 
of  the  effort  required  to  check  each  of  the  approximately  4000  articles  against  this 
bibliography,  I  decided  first  to  check  a  single  scientific  group  to  see  if  a  separate 
citation  rate  for  review  articles  improves  the  fit  to  the  second  priority  score.  The 
medical  group  was  chosen  for  this  experiment,  and  articles  from  grants  with  second 
priority  scores  were  matched  against  the  "Bibliography  of  Medical  Reviews.”  The 
set  of  articles  designated  as  reviews  by  this  method  did  have  a  higher  than  average 
citation  rate,  but  no  improvement  in  the  fit  to  the  second  priority  score  was  obtained 
by  entering  separate  variables  for  average  citation  rates  of  review  and  nonreview 
publications.  Therefore,  I  did  not  try  using  review  articles  for  the  other  scientific 
fields. 

The  variable  that  has  been  chosen  from  these  regressions  to  represent  the 
research  quality  of  a  grant  is  the  average  number  of  citations  of  all  of  its  publications 
that  were  cited  at  least  twice  in  the  six  years  following  publication.  This  performed 
only  slightly  better  than  average  citation  rate  of  journal  articles  and  average  cita¬ 
tion  rate  for  all  publications.  Eliminating  works  that  were  not  cited  may  compensate 
for  some  errors  in  the  data.  If,  for  example,  the  author’s  name  was  misspelled  in  the 
reference  to  the  work,  then  citations  to  the  work  would  not  have  been  retrieved.  This 
measure  also  eliminates  65  percent  of  the  books,  86  percent  of  the  talks  and  ab¬ 
stracts,  and  81  percent  of  the  theses.  Thus,  publications  other  than  journal  articles 
are  counted  in  the  research  quality  measure  only  when  they  were  cited  and  there¬ 
fore  presumably  represented  the  communication  of  useful  research  results.  This 
measure  is  highly  correlated  with  the  average  citation  rate  of  journal  articles,  and 
these  two  measures  could  be  used  interchangeably  with  no  substantive  changes  in 
any  of  the  conclusions  presented  below. 

The  results  of  the  regression  of  the  second  score  on  this  output  measure  are 
presented  in  Table  28.  The  coefficients  of  the  regressions  for  the  basic  science  group 
and  the  medical  group  are  numerically  close  and  statistically  indistinguishable.  The 
meaning  of  a  citation  does  not  appear  to  differ  between  these  two  fields,  and  they 
may  be  combined  in  further  analysis. 

From  the  statistically  significant  relationship  between  the  renewal  score  and 
this  and  other  citation  measures,  citations  do  appear  to  measure  research  quality. 
However,  the  relationship  is  actually  quite  weak,  as  can  be  seen  by  the  small  value 
of  the  R^  of  the  equations.  There  are  many  possible  sources  of  error  in  both  of  these 
output  measures.  Assume  for  a  moment  that  there  is  an  objective  (and  unknown) 
cardinal  measure  of  the  scientific  merit  of  each  of  these  grants.  The  study  section 
judgment  of  the  renewal  application  is  a  judgment  not  directly  of  the  merit  of  the 
research  during  the  preceding  period  but  of  the  merit  of  the  renewal  application, 
which  may  propose  a  different  course  of  action  in  the  future.  The  citation  measure 
does  not  have  this  particular  fault  but  instead  suffers  from  a  severe  oversimplifica¬ 
tion  of  the  dimensions  of  scientific  output  evaluated  by  the  IRGs.  One  might  assume 
that  each  of  the  two  output  measures  would  be  more  correlated  with  the  hypothe¬ 
sized  cardinal  measure  than  with  each  other. 

For  any  particular  research  grant,  one  can  expect  a  difference  between  a  citation 


45 


Table  28 

REGRESSION  OF  SECOND  PRIORITY  SCORE  ON  A 
CITATION  MEASURE  OF  RESEARCH  QUALITY 


Scientific 

Field  Group 

Basic 

Science 

Medical 

Other 

Number  of  data  points 

117 

144 

73 

r2 

0.21 

0.16 

O.OI 

Constant  term 

264.9 

268.8 

238.6 

Citation  measure 

-1.753^ 

-1.954® 

-0.390 

(4.0) 

(4.6) 

(0.3) 

Total  publications 

-0.483 

0.872 

-0.021 

(0.5) 

(0.9) 

(0.0) 

Renewal 

2.189 

6.351 

2.89 

(O.I) 

(0.5) 

(O.I) 

Annual  award  ($000) 

-0.876 

-0.687b 

-0.221 

(1.92) 

(2.1) 

(0.7) 

NOTE:  t-statistics  are  in  parentheses. 


^Significant  at  the  0.001  level. 
^Significant  at  the  0.05  level. 


measure  of  research  quality  and  the  judgment  of  a  peer  review  panel  of  research 
quality.  However,  in  examining  the  results  of  different  policies  for  the  management 
of  research,  one  is  examining  the  output  of  large  groups  of  grants;  and  since  citations 
are  correlated  with  research  quality,  the  errors  in  individual  grants  should  average 
out.  Since  citations  are  cheap  to  obtain  compared  with  peer  judgments,  they  should 
be  useful  for  questions  of  research  policy. 

The  weakness  in  the  relationship  between  the  citation  measure  of  research 
output  and  the  second  priority  score  raises  many  questions.  Presumably  a  stricter 
grouping  of  scientific  fields  will  improve  the  quality  of  the  fit,  but  it  is  unlikely  that 
the  can  be  raised  even  to  0.5.  The  simplest  explanation  is  that  a  large  part  of 
scientific  merit  remains  that  is  not  captured  by  the  suggested  citation  measure.  As 
evidence  in  favor  of  this  assertion.  Table  29  shows  that  the  priority  score  received 
in  1967  remains  an  important  explanatory  variable  for  the  second  score  after  the 
citation  measure  is  included  in  the  regression.  Section  VI  shows  that  this  remaining 
correlation  between  the  second  score  and  the  citation  measure  is  not  due  to  a  bias 
in  favor  of  the  medical  schools  that  receive  the  largest  share  of  NIH  money.  It  is 
reasonable  to  believe  that  the  part  of  the  scientific  merit  judgment  of  the  second  IRG 
that  can  be  explained  by  the  priority  score  of  the  original  IRG,  and  not  by  the 
citation  measure,  represents  a  valid  prediction  by  the  original  IRG  of  the  scientific 
merit  of  the  proposed  work.  When  the  priority  score  received  in  1967  is  entered  into 
the  equation,  the  coefficient  on  the  dollars  awarded  is  no  longer  significant.  Thus, 
this  variable  was  functioning  as  a  surrogate  for  research  quality  (the  best  grants 
receive  more  money),  rather  than  as  a  control  for  resources  expended. 


46 


Table  29 

RKCKESSION  OF  SECOND  PRIORITY  SCORE  ON  ORI(;iNAL 
PRIORl'I'Y  SCORE  AND  CITATION  MEASURE 


Sclent i f ic 

Field  (Iroiip 

Basic 

Sc ience 

Medical 

Ot  her 

Number  of  data  points 

117 

144 

71 

r2 

0.26 

0.24 

0.12 

(;onstant  term 

172.7 

171.9 

118.6 

Citation  measure 

-1.589^ 

-1.501^ 

-0.569 

(3.7) 

(3.5) 

(0.5) 

Total  publications 

-0.889 

0.758 

-0.752 

(0.9) 

(0.8) 

(0.4) 

Renewal 

5.366 

1.110 

2.583 

(0.4) 

(0.0) 

(0.1) 

Annual  award  ($000) 

-0.529 

-0.528 

-0.054 

(1.1) 

(1.6) 

(0.2) 

Priority  score 

0. 393^’ 

0.4063 

0. 580b 

(2.8) 

(3.8) 

(2.9) 

NOTE:  I -statistics  are  in  parentheses. 


^Significant  at  the  0.001  level. 
*^Signi  f  leant  at  the  0.01  level. 


A  MODEL  OF  RESEARCH  OUTPUT 

Because  of  the  uncertainty  about  the  value  of  citations  as  a  measure  of  the 
research  output  for  grants  from  other  than  the  basic  science  and  medical  groups,  the 
subsequent  analysis  concentrates  on  these  two  groups.  I  first  examine  the  relation¬ 
ship  between  publication  and  citation  data  and  the  number  of  years  for  which  each 
research  project  grant  was  supported  between  FY  1967  and  FY  1970.  The  data  are 
shown  in  Table  30. 

The  proportion  of  grants  for  which  no  publications  were  retrieved  is  very  high 
for  the  grants  that  received  only  one  or  two  years  of  support  during  this  period. 
However,  one  cannot  say  that  no  publications  were  actually  produced  by  these 
grants.  The  Research  Grants  //zofex lists  only  publications  under  funded  grants  each 
year.  Therefore,  if  articles  were  published  after  the  termination  of  support  (as  is 
likely  because  of  publication  delays)  they  were  not  included  in  the  sample.  In  addi¬ 
tion,  in  some  years  the  budget  allocated  to  the  Division  of  Research  Grants  has  not 
been  adequate  to  process  all  the  funded  grants,  and  some  publications  may  have 
been  missed.  Because  of  the  large  proportion  of  the  grants  that  were  supported  for 
one  or  two  years  for  which  I  have  no  publication  data,  it  would  be  erroneous  to 
include  the  set  of  grants  supported  for  only  two  or  three  years  in  any  analysis  of 
research  output.  I  chose  to  include  in  the  analysis  the  smaller  proportion  of  grants 
that  were  supported  for  three  or  more  years,  but  had  no  publications,  on  the  assump¬ 
tion  that  the  productivity  of  many  of  these  grants  was  lower  than  the  productivity 
of  grants  that  did  have  publications. 

In  Table  30,  one  can  see  that  the  chosen  citation  measure  is  better  (significant 
at  the  0.01  level)  for  grants  supported  for  four  years  than  for  grants  supported  for 


47 


Table  30 

YEARS  OF  SUPPORT  AND  OUTPUT  MEASURES 

(Research  project  grants  from  basic 
science  and  medical  groups) 


Years  of 
Support 

Number 

Percentage  with 
No  Publications 

Average 
Publication 
per  Grant 

Average  Value 
of  Citation 
Measure  per  Grant 

1 

25 

76.0 

3.68 

3.68 

2 

60 

43.3 

3.26 

8.96 

3 

204 

18.6 

5.07 

10.71 

4 

249 

7.6 

8.75 

15.84 

only  three  years.  This  conclusion  remains  even  if  we  eliminate  the  grants  with  no 
publications.  Of  the  grants  supported  for  only  one  or  two  years  and  for  which 
publications  were  retrieved,  59  percent  were  renewal  applications.  Many  of  these 
grants  received  a  shorter  than  usual  award  in  order  to  complete  work  that  was  well 
under  way.  Because  of  uncertainty  about  the  output  from  the  grants  funded  for  only 
one  or  two  years  for  which  no  publications  are  available,  I  draw  no  conclusions  about 
the  research  quality  of  grants  funded  for  only  short  periods  of  time. 

There  are  two  dimensions  of  research  output.  The  first  is  the  number  of  publica¬ 
tions  in  the  sample  that  were  cited  at  least  twice.  I  include  only  these  publications, 
on  the  grounds  that  only  such  publications  communicated  useful  research  results, 
and  thus  this  count  is  correlated  with  the  quantity  of  research  results  produced.  The 
other  dimension  is  the  average  citation  rate  of  these  publications,  which  I  take  as 
a  measure  of  research  quality. 

The  determinants  of  research  output  include  the  magnitude  of  resources  devot¬ 
ed  to  the  research,  for  which  I  use  the  number  of  years  of  support  received  between 
FY  1967  and  FY  1970  (either  three  or  four  years  in  this  sample),  and  the  total  dollars 
awarded  during  this  period  including  supplements.  In  all  regressions  in  this  and  the 
following  section  I  use  the  natural  logarithm  of  dollar  amounts,  which  gave  a  better 
fit  to  the  data  than  using  just  dollars.  For  many  of  the  regressions  to  be  presented 
I  also  tried  entering  both  the  untransformed  dollar  amount  and  its  square.  The  fit 
was  about  as  good  as  using  the  log,  and  there  were  no  significant  differences  in  the 
magnitudes  of  other  coefficients.  An  important  control  variable  is  whether  the  appli¬ 
cation  was  a  new  or  renewal  application,  since  the  average  number  of  publications 
is  significantly  higher  for  renewal  applications,  probably  because  of  the  lead  time 
required  to  accomplish  published  work. 

Another  possible  determinant  of  research  output  is  the  kind  of  medical  schooF 
where  the  principal  investigator  works.  Is  better  research  carried  on  in  a  medical 
school  highly  oriented  to  research?  As  a  surrogate  for  being  very  research  oriented 
I  chose  to  examine  the  14  medical  schools  that  received  the  largest  share  of  NIH 
grant  awards  in  FY  1967.  In  subsequent  work  I  shall  consider  other  possible  determi¬ 
nants  of  research  output:  age,  rank,  and  administrative  position.  However,  here  I 
present  the  preliminary  results  using  just  the  aforementioned  variables. 

^  Grants  to  hospitals  and  research  organizations  are  combined  with  grants  to  their  affiliated  medical 
schools. 


48 


In  preliminary  regressions  I  found  that  better  fits  may  be  obtained  to  the  loga¬ 
rithm  of  the  dependent  variables  than  to  the  untransformed  variables.  The  choice 
of  the  logarithm  is  also  appropriate  because  of  the  heteroscedasticity'*  of  both  depen¬ 
dent  variables,  although  this  is  much  more  pronounced  in  the  publication  count 
than  in  the  average  citation  rate.  The  actual  dependent  variables  are  log  (p  +  1)  and 
log  tc  -t  1)  where  p  is  the  publication  count  and  c  is  the  citation  measure.  The  results 
are  displayed  in  Table  31.  Both  outputs  are  related  to  the  total  dollars  committed 
to  their  support.  The  relationship  between  the  citation  measure  and  years  of  support 
is  weaker  than  that  between  the  publication  count  and  years  of  support.  Renewal 
grants  and  grants  from  the  research-intensive  schools  produced  both  more  publica¬ 
tions  and  higher  quality  publications  by  this  measure. 


Table  31 

DETERMINANTS  OF  THE  OUTPUT  VARIABLES 


Dependent 

Variables 

Pub  1 icat ion 

Count 

Citation 

Measure 

r2 

0.24 

0.13 

Constant  term 

00 

1 

-3.37 

Years  of  support 

0.2100^ 

0.2341 

(2.6) 

(1.9) 

Total  dollars 

0. 3597^’ 

0.3766'^ 

(5.0) 

(3.3) 

Research  oriented 

0.2074^1 

0.31193 

medical  school 

(2.8) 

(2.7) 

Renewal 

0.4568^ 

0.4188^ 

(6.4) 

(3.7) 

^Significant  at  tlie  0.01  level. 
^Significant  at  the  0.001  level. 


CHARACTERISTICS  OF  1967  DECISION 

The  next  question  to  be  addressed  is  the  relationship  between  the  output  of  the 
research  grants  and  the  characteristics  of  the  1967  decisions  to  fund  the  grant.  I 
again  consider  two  kinds  of  output:  the  total  number  of  publications  that  were  cited 
at  least  twice  and  the  average  citation  rate  of  the  publications.  The  characteristics 
of  the  1967  decision  are  the  priority  score,  the  number  of  years  for  which  support 
was  committed  in  1967,  and  the  logarithm  of  the  dollar  amount  awarded  in  1967. 

An  important  control  variable  is  whether  the  application  was  a  new  or  a  renewal 
application.  One  would  also  like  to  control  for  the  magnitude  of  the  resources 
expended  to  produce  this  output.  For  this  purpose  I  have  the  number  of  years  for 
which  the  grant  was  supported  between  1967  and  1970  (either  three  or  four  years 


■*  The  magnitude  of  the  deviations  from  the  regression  line  increases  with  increased  values  of  the 
dependent  variable.  See  Morris  and  Rolph,  1971,  for  an  introduction  to  the  problem  and  the  use  of 
transformation. 


49 


for  this  sample)  and  the  total  dollars  awarded  in  this  period.  Unfortunately,  the 
correlation  coefficient  between  the  number  of  years  of  support  eventually  received 
and  that  awarded  in  1967  is  0.48,  and  the  correlation  coefficient  between  the  dollars 
awarded  in  1967  and  the  amount  awarded  during  the  total  period  is  0.85.  This 
correlation  makes  it  difficult  to  distinguish  between  the  effect  of  the  resources 
committed  and  the  prescience  of  the  NIH  system  in  awarding  more  productive 
grants  either  a  larger  dollar  amount  or  a  longer  period  of  support. 

The  results  of  these  regressions®  are  displayed  in  Table  32.  Because  of  the 
collinearity  problem,  I  also  display  regressions  without  the  control  variables  for 
resources  used.  In  the  equations  for  publication  count,  renewal  applications  produce 
significantly  more  publications,  as  expected.  There  is  no  relationship  between  publi¬ 
cation  count  and  the  priority  score  received  in  1967,  another  indication  that  number 
of  publications  is  not  related  to  research  quality.  The  log  of  the  dollars  awarded  in 
FY  1967  is  not  significant  when  the  total  dollars  expended  to  produce  these  publica¬ 
tions  is  included,  but  it  is  significant  when  that  variable  is  excluded.  Although  the 
number  of  years  for  which  the  grant  was  awarded  is  significant  in  both  regressions, 
the  number  of  years  of  support  does  not  come  into  the  regression.  Since  this  variable 
is  significantly  related  to  the  dependent  variable  (see  Table  31),  we  again  cannot  be 
sure  whether  there  is  a  relationship  between  the  number  of  years  of  support  origi¬ 
nally  awarded  and  the  publication  count.  In  summary,  no  unambiguous  relationship 
between  the  publication  count  and  the  characteristics  of  the  1967  decision  can  be 
found  in  the  data. 

When  one  examines  the  citation  measure  variable  the  situation  is  quite  differ¬ 
ent.  The  priority  score  variable  is  significant.  The  number  of  years  awarded  in  1967 
is  significant  in  both  regressions,  and  since  the  number  of  years  of  support  is  less 
related  to  the  citation  measure  (see  Table  31)  one  can  assume  that  in  1967  the  NIH 
system  awarded  longer  periods  of  support  to  better  grants  initially,  as  well  as 
through  the  renewal  process.  The  relationship  between  the  dollars  awarded  in  1967 
and  the  citation  measure  has  the  same  ambiguity  as  was  found  in  the  publication 
count  equation,  and  so  no  conclusion  can  be  drawn. 

Renewal  applications  score  higher  on  the  citation  measure  after  one  controls  for 
priority  score  and  other  variables.  This  corroborates  the  observation  made  in  Sec. 
II  that  new  applications  appear  to  receive  a  bonus  in  priority  score  compared  with 
renewal  applications.  As  mentioned  in  that  section,  this  may  be  due  to  desirable 
risk-prone  behavior  on  the  part  of  the  study  sections. 


^  Again  the  actual  dependent  variables  used  in  these  equations  are  log  (p  +  1)  and  log  (c  +  1)  where 
p  =  number  of  publications  cited  at  least  twice  and  c  =  average  number  of  citations  of  these  publications. 


50 


Table  32 

OUTPUT  VARIABLES  AS  A  FUNCTION  OF  CHARACTERISTICS 
OF  DECISION  IN  FY  1967 


Dependent 

Variable 

Publication 

Count 

Publication 

Count 

Citation 

Measure 

Citation 

Measure 

Number  of  data  points 

453 

453 

453 

453 

r2 

0.25 

0.22 

0.14 

0.12 

Constant  term 

5.57 

-1.14 

-4.24 

0.90 

Priority  score 

-0.0001 

-0.0000 

-0.0026® 

-0.0026® 

(0.1) 

(0.0) 

(2.3) 

(2.4) 

Years  awarded 

0.1926^ 

0.2232b 

0.1456® 

0.1665® 

(4.4) 

(5.4) 

(2.3) 

(2.4) 

Log  of  dollars  awarded 

-0.111 

0.3306b 

-0.2082 

0.3141^ 

in  1967 

(0.7) 

(4.5) 

(0.9) 

(2.9) 

Renewal 

0.3758b 

0.4517b 

0.3511'^ 

0.4312b 

(4.8) 

(5.8) 

(3.0) 

(3.8) 

Years  of  support 

0.0497 

-0.0239 

(0.5) 

(0.2) 

Log  of  total  dollars 

0.505^ 

0.6006® 

awarded 

(3.1) 

(2.5) 

^Significant  at  the  0.05  level. 
*^Signif leant  at  the  0.001  level. 


''Significant  at  the  0.01  level. 


VI.  POLICY  QUESTIONS 


The  analyses  in  Secs.  II,  IV,  and  V  of  this  report  use  different  output  measures 
for  research  grants:  the  priority  score  received  on  a  renewal  application,  the  produc¬ 
tion  of  at  least  one  frequently  cited  article,  and  the  average  citation  rate  for  publica¬ 
tions  cited  at  least  twice.  In  each  case  a  relationship  is  established  between  the 
output  measure  and  the  priority  score.  These  relationships  are  used  below  for  an 
evaluation  of  the  peer  review  process  as  a  system  for  selecting  which  applications 
for  research  project  grants  to  fund.  The  question  of  bias  in  the  study  sections’ 
judgments  of  applications  from  particular  medical  schools  is  then  explored. 

The  effect  of  the  funding  instrument  used  to  support  research  is  an  important 
policy  question.  The  data  available  in  this  study  allow  one  to  compare  publication 
and  citation  data  for  research  project  and  program  project  grants. 

The  policies  that  have  governed  NIH  include  dividing  research  funds  for  work 
related  to  disease  categories  according  to  national  priorities  rather  than  according 
to  the  demand  for  support  in  the  area  by  the  biomedical  research  community.  This 
policy  will  be  successful  only  if  good  researchers  can  be  attracted  to  an  area  by  the 
availability  of  funds.  In  order  to  investigate  this  possibility,  the  investigators  whose 
applications  were  assigned  to  more  than  one  Institute  of  NIH  are  characterized  in 
terms  of  their  previous  history  in  the  NIH  system.  The  level  of  funding  appears  to 
affect  the  research  plans  of  some  of  the  best  investigators,  as  well  as  others. 

The  total  level  of  funds  available  to  support  research  project  grants  has  declined 
considerably  during  the  last  six  years.  The  effect  of  this  decline  on  research  output 
as  measured  by  publication  and  citation  data  is  addressed  in  the  final  subsection  of 
this  report. 


THE  PEER  REVIEW  PROCESS 

Summary  of  the  Preceding  Analysis  of  the  Peer  Review  Process 

Section  II  shows  that  changes  in  the  average  priority  score  received  on  renewal 
applications  and  the  rate  at  which  renewal  applications  are  disapproved  imply  a 
verification  of  the  judgment  of  the  earlier  study  section  by  the  reviewers  of  the 
renewal  application.  Sections  IV  and  V  show  that,  at  least  for  grant  applications 
from  most  of  the  larger  basic  science  and  clinical  departments  of  medical  schools, 
the  judgments  of  the  peer  review  process  are  significantly  related  to  an  objective 
measure  of  research  output  derived  from  citations  to  articles  describing  the  results 
of  the  grant.  The  first  relationship  points  to  the  existence  of  criteria  for  scientific 
merit  that  are  shared  by  different  study  section  members.  The  second  relationship 
shows  that  these  criteria  are  related  to  objective  measures  of  research  output. 

The  correlation  coefficient  between  the  score  received  on  a  renewal  application 
and  the  score  received  the  previous  time  the  grant  was  reviewed  and  funded  is  only 
about  0.4.  For  the  grants  in  our  sample  this  low  correlation  can  be  explained  in  part 
by  the  fact  that  the  second  priority  score  is  more  related  to  actual  research  output 


51 


52 


than  the  original  priority  score.  For  example,  Table  33  shows  that  although  the 
priority  score  awarded  in  1967  was  better  for  grants  that  would  eventually  produce 
a  most-cited  article  than  for  other  grants,  the  magnitude  of  the  difference  in  the 
average  of  the  subsequent  renewal  scores  is  over  three  times  as  large  as  the  differ¬ 
ence  in  the  original  scores.  We  have  seen  (Sec.  V)  that  the  continuous  citation 
measure  is  also  more  strongly  related  to  the  score  received  on  subsequent  renewal 
application  than  to  the  score  received  in  1967  before  the  research  was  performed. 
These  observations  are  consistent  with  the  existence  of  a  great  deal  of  uncertainty 
in  preliminary  judgments  of  the  scientific  merit  of  proposed  research  and  an  adapta¬ 
tion  by  the  second  study  section  to  the  actual  merits  of  the  research  and  away  from 
the  earlier  appraisal  of  its  apparent  potential.  It  appears  that  the  peer  review  system 
is  very  flexible  about  adapting  its  judgment  of  applications  to  reflect  changes  in 
merit. 

Two  of  the  output  measures,  subsequent  renewal  scores  and  citation  rate,  indi¬ 
cate  that  new  applications  receive  a  slightly  better  priority  score  than  renewal 
applications  of  the  same  merit.  Data  presented  by  Douglass  and  James  (1973)  show 
that  in  any  year,  approximately  10  percent  of  the  principal  investigators  supported 
by  the  NIH  are  being  supported  for  the  first  time.  The  peer  review  process  appears 
to  be  quite  open  to  new  ideas  and  new  people. 


Table  33 

PRIORITY  SCORES  FOR  GRANTS  THAT  PRODUCED 
AT  LEAST  ONE  OF  THE  TOP  5  PERCENT 
OF  MOST-CITED  ARTICLES 


Most-Cited 

Other 

Grants 

Grants 

Number  of  grants 

78 

340 

Average  priority  score 
awarded  in  1967 

200.3 

217.6 

Average  priority  score  on 
subsequent  renewal 

178.9 

2  32.4 

Institutional  Bias 

One  question  that  has  not  yet  been  addressed  is  whether  the  peer  review  process 
exhibits  a  bias  in  favor  of  certain  institutions.  This  is  one  way  of  addressing  the 
meaningfulness  of  the  charge  of  conflict  of  interest  that  has  been  leveled  at  the  peer 
review  system.'  Since  the  conflict  of  interest  rules  prohibit  a  study  section  member 
from  being  present  while  an  application  from  his  institution  is  under  consideration, 
study  section  members  are  not  placed  in  a  position  of  direct  conflict  of  interest. 

The  14  medical  schools  that  received  the  largest  dollar  amount  of  NIH  grants 
in  FY  1967  increased  their  share  of  NIH  grants  between  1967  and  1973  at  approxi¬ 
mately  three  times  the  rate  of  increase  experienced  by  other  medical  schools  estab- 


See  Sec.  I. 


53 


lished  earlier  than  1966.^  Section  V  showed  that  grants  from  this  group  of  schools 
average  higher  scores  than  other  grants  on  the  citation  measure  used  for  research 
quality.  Did  schools  that  increased  their  share  of  NIH  funds  do  so  solely  because  of 
their  excellence  in  biomedical  research  or  partly  because  of  favoritism  to  their 
institutions? 

The  schools  that  increased  their  share  of  NIH  funds  did  so  by  receiving  priority 
scores  that  were  good  enough  to  result  in  funding.  To  examine  this  question  I 
compare  the  extent  to  which  the  priority  score  can  be  predicted  by  the  research 
output  of  the  grant  and  the  extent  to  which  it  can  be  predicted  by  the  judgment  of 
study  sections  on  all  applications  from  the  same  medical  school.  Regression  analysis 
is  used  to  predict  the  priority  score  as  a  function  of  the  citation  measure  of  the  output 
of  the  grant  and  the  peer  groups’ judgments  of  all  applications  from  the  same  school. 

I  first  consider  the  judgments  of  the  study  section  on  competing  renewal  appli¬ 
cations  received  after  FY  1967.  The  measure  of  research  output  is  the  number  of 
citations  of  each  article  that  was  cited  at  least  twice.  For  the  peer  group’s  judgment 
of  the  medical  school,  I  use  the  average  normalized  priority  score  awarded  to  all 
other  competing  ROl  applications  from  the  same  school  in  the  same  fiscal  year  as 
the  renewal  applications.  If  the  coefficient  of  this  variable  is  not  significantly  differ¬ 
ent  from  zero,  then  one  can  conclude  that  the  judgment  of  a  particular  application 
is  not  colored  by  the  perception  of  the  quality  of  the  medical  school  to  an  extent 
inconsistent  with  the  output  of  the  grant.  The  sample  is  restricted  to  renewal 
applications  from  the  basic  science  and  medical  groups,  since  it  is  for  this  set  of 
grants  that  the  citation  measure  appears  to  relate  to  research  quality;  only  renewal 
applications  processed  three  or  more  years  after  1967  are  included  so  that  the 
available  citation  data  represent  the  work  the  study  section  judged. 

As  shown  in  the  first  regression  in  Table  34,  the  average  priority  score  awarded 
to  other  applications  from  the  same  school  has  no  explanatory  power  for  the  score 
received  on  these  renewal  applications  after  one  controls  for  the  citation  measure. 
A  dummy  variable  that  was  1  if  the  application  in  1967  was  also  a  renewal  is 
included  in  the  equation,  but  it  is  not  significant.  In  other  regressions  not  reported 
in  detail  here,  instead  of  the  average  priority  score  of  other  applications  from  the 
same  school,  I  used  a  dummy  variable  that  was  1  only  for  applications  from  the  14 
schools  that  received  the  most  money  from  NIH  in  1967.  Although  the  coefficient 
of  this  variable  was  significantly  related  to  the  second  score  (at  the  0.001  level)  before 
the  citation  measure  was  included,  it  was  not  significant  (even  at  the  0.15  level) 
afterward.  Therefore,  the  judgment  of  these  renewal  applications  shows  no  evidence 
of  distortion  by  a  judgment  of  the  medical  school. 

The  judgment  of  applications  in  1967  has  been  examined  in  a  similar  fashion, 
but  in  this  case,  missing  data  present  a  problem  because  only  priority  scores  on 
funded  applications  are  available  for  control  purposes.  The  results  of  the  regression 
are  shown  in  the  second  column  of  Table  34.  The  score  awarded  to  other  funded 
applications  from  the  same  school  is  significantly  related  to  the  priority  score  after 
the  citation  measure  is  used  to  control  for  research  output. 

The  reason  for  the  result  is  not  clear.  It  may  be  that  the  study  section’s  expecta¬ 
tion  of  the  potential  worth  of  an  application  is  influenced  by  the  general  level  of 
research  at  the  school  more  than  is  consistent  with  the  differences  in  research 


^  Carter  et  al.,  1974. 


54 


Table  34 

SECOND  PRIORITY  SCORE  AS  A  FUNCTION  OF  RESEARCH  OUTPUT 
AND  AVERAGE  PRIORITY  SCORE  AWARDED  TO  GRANTS 
FROM  THE  SAME  MEDICAL  SCHOOL 


Priority  Score 
on  Subsequent 
Renewal  Application 

Priority  Score 
on  FY  1967 
Application 

r2 

0.15 

0.06 

Constant 

247.76 

229.10 

Citation  measure 

-1.928^ 

-0.6288^ 

(6.6) 

(3.5) 

Average  priority  score 

5.062 

28.56° 

(0.3) 

(3.1) 

Renewal  application 

3.44 

-2.727 

(0.4) 

(0.6) 

NOTE:  t-statistics  are  in  parentheses. 


^Significant  at  the  0.001  level. 
^Significant  at  the  0.01  level. 


output  among  schools.  Alternatively,  it  may  only  reflect  the  existence  of  dimensions 
of  research  quality  that  are  not  captured  by  the  citation  measure  but  are  correlated 
with  the  general  level  of  research  quality  at  the  medical  school.  Still  another  possi¬ 
bility  is  that  other  explanatory  variables  related  to  the  priority  score  are  also 
correlated  in  this  sample  with  the  average  score  received  by  applications  from 
individual  medical  schools. 


RESEARCH  PROJECT  AND  PROGRAM  PROJECT  GRANTS 

A  comparison  of  the  publications  and  citation  data  for  research  project  and 
program  project  grants  shows  one  dimension  of  the  differential  effect  of  these  two 
funding  instruments.  Again  the  discussion  is  restricted  to  grants  from  the  basic 
science  and  medical  groups  and  to  grants  that  were  supported  for  at  least  three 
years.  The  data  are  shown  in  Table  35.  The  29  program  projects  received  39  percent 
of  the  dollars  awarded  to  this  set  of  grants,  but  they  produced  only  26  percent  of  the 
publications.  In  further  work  it  will  be  possible  to  examine  expenditure  data  by 
categories  of  expenditure  in  an  attempt  to  find  an  explanation  for  this  difference  in 
rate  of  publications. 

The  average  value  of  the  citation  measure  is  slightly  higher  for  program  projects 
than  for  research  projects,  but  a  two-tailed  t-test  shows  the  difference  is  significant 
only  at  the  0.11  level.  Histograms  of  the  citation  measure  are  shown  in  Fig.  8.  Almost 
50  percent  of  the  ROl  grants  have  a  citation  measure  less  than  10,  but  only  14 
percent  of  the  POl  grants  have  such  a  low  value.  Because  of  the  size  of  the  award, 
it  may  be  that  program  project  grants  are  likely  to  be  awarded  only  if  the  chance 
of  significant  results  is  high. 


55 


Table  35 

COMPARISON  OF  ROl  AND  POl  GRANTS^ 


ROl  Grants 

POl  Grants 

Grants 

453 

29 

Total  awarded  in  first  year 

($000) 

12,842  (61.3%) 

8,093  (38.7%) 

Average  awarded  in  first  year 

($000) 

28.3 

279.1 

Total  publications 

2,998  (73.7%) 

],069  (26.3%) 

Average  publication  per  grant 

6.62 

26.86 

Standard  deviation 

7.17 

35.12 

Total  publications  cited  at 

least  twice 

1,698  (74.9%) 

570  (25.1%) 

Average  number  per  grant 

3.75 

19.66 

Standard  deviation 

4.40 

18.59 

Citation  measure 

Average  value  per  grant 

13.53 

17.80 

Standard  deviation 

13.85 

9.17 

^Basic  science  and  medicaJ  groups.  All  grants  funded  for 
at  least  three  years. 


Citations 


Citations 

Fig.  8 - Comparison  of  distribution  of  citation  measure 

for  program  project  and  research  project  grants 


56 


To  summarize,  program  projects  produce  somewhat  fewer  publications  per  dol¬ 
lar  than  research  project  grants,  but  the  usefulness  of  the  publications  is  probably 
slightly  greater.  Section  V  showed  that  the  study  section  evaluation  of  scientific 
merit  was  more  correlated  with  citations  than  with  publications.  Consequently,  it 
seems  reasonable  to  assume  that  the  program  project  grants  awarded  in  1967  repre¬ 
sented  at  least  as  good  an  investment  as  the  traditional  research  projects. 


EFFECT  OF  LEVEL  OF  FUNDING  ON  APPLICATIONS  TO 
INSTITUTES 

Congress  appropriates  money  for  the  NIH  based  on  its  priorities  for  research 
related  to  particular  disease  categories.  Increased  demand  from  the  biomedical 
research  community  for  support  of  research  in  an  area  will  often  follow  the  an¬ 
nouncement  of  a  particular  high-priority  program.  For  example,  the  number  of 
applications  for  new  research  project  grants  in  areas  of  interest  to  the  National 
Cancer  Institute  increased  by  136.2  percent  between  the  declaration  of  the  war  on 
cancer  in  1970  and  1973.  During  the  same  period  the  number  of  new  applications 
for  research  project  grants  for  all  of  NIH  rose  only  34.6  percent,  and  the  maximum 
increase  experienced  by  any  other  Institute  was  54.1  percent.  However,  if  the  policy 
of  shifting  emphasis  on  particular  disease-related  research  areas  is  to  be  completely 
successful,  it  is  necessary  not  only  to  attract  researchers  to  an  area  but  to  attract 
some  of  the  best  researchers. 

The  research  of  many  principal  investigators  appears  to  change  over  time  so  as 
to  be  more  appropriate  for  different  Institutes,  and  I  compare  the  characteristics  of 
medical  school  principal  investigators  whose  applications  are  assigned  to  more  than 
one  Institute  with  the  characteristics  of  those  who  apply  only  in  areas  of  interest 
to  a  single  Institute.  The  purpose  of  this  exploration  is  to  determine  whether  shifts 
in  the  availability  of  research  funds  in  different  areas  can  affect  the  choice  of  topics 
of  good  researchers.  I  confine  the  analysis  to  new  applications. 

New  applications  are  divided  into  four  classes  depending  on  whether,  before  this 
application,  the  principal  investigator  has: 

(1)  applied  only  to  this  Institute,^ 

(2)  applied  only  to  Institutes  other  than  the  Institute  of  this  application, 

(3)  applied  to  both  this  Institute  and  others, 

(4)  not  applied  for  an  ROl  grant. 

Unfortunately,  our  copy  of  the  IMPAC  file  contains  all  applications  only  from  1968 
through  1973;  therefore,  I  cannot  define  these  classes  based  on  each  applicant’s 
entire  history.  Table  36  shows  the  proportion  of  new  applications  that  fall  in  each 
class  using  only  the  available  data  each  year.  By  1972  the  percentage  in  each  class 
appears  to  have  stabilized,  and  applications  for  only  FY  1972  and  FY  1973  money 
are  used  in  what  follows.  This  characterization  will  allow  distinction  of  investigators 


^  Principal  investigators  actually  apply  to  NIH  as  a  whole  rather  than  to  any  one  categorical  Institute. 
An  official  of  the  Division  of  Research  Grants  determines  the  Institute  to  which  the  application  will  be 
assigned  based  on  the  subject  matter  of  the  research.  In  the  following  I  shall  use  the  phrase  "an 
investigator  applied  to  an  Institute”  as  a  shortened  version  of  "an  investigator  applied  for  support  in  a 
research  area  of  interest  to  an  Institute.” 


57 


Table  36 

PERCENTAGE  OF  NEW  APPLICATIONS  IN  EACH  CLASS 


Class 

1968 

1969 

1970 

1971 

1972 

1973 

Applied  to  same 

Institute 

14.4 

27.6 

28.1 

31.4 

31.0 

29.2 

Applied  to  only  another 

Institute 

6.8 

II. 3 

14.4 

16. 1 

16.8 

18.2 

Applied  to  both  same  and 

another  Institute 

1.7 

5.0 

7.7 

9.5 

13.0 

13.4 

No  history 

77.1 

56.0 

49.8 

43.0 

39.2 

39.2 

Number  of  applications 

2185 

2147 

2240 

2335 

2745 

3014 

who  have  pursued  research  topics  of  interest  to  more  than  one  Institute  from  those 
who  have  investigated  only  in  areas  of  primary  interest  to  a  single  Institute. 

Is  there  a  difference  in  the  quality  of  the  investigators  in  these  classes?  An 
argument  might  he  made  that  people  switch  from  one  field  to  another  primarily 
because  they  are  not  good  enough  to  receive  support  in  the  first  field.  An  opposing 
argument  might  be  that  only  the  best  researchers  have  a  broad  enough  knowledge 
to  discern  opportunities  for  useful  research  in  more  than  one  field.  If  good  research¬ 
ers  do  change  their  interests,  and  scientific  opportunities  are  available  to  the  same 
researcher  in  more  than  one  field,  then  the  deciding  factor  in  the  choice  of  research 
topics  may  be  either  the  availability  of  funds  or  the  desire  to  respond  to  announced 
national  objectives.  Since  the  best  hope  of  success  in  research  lies  in  enlisting  the 
best  qualified  people,  it  is  worth  examining  the  kinds  of  researchers  who  do  change 
fields. 

For  a  surrogate  for  the  quality  of  a  principal  investigator  I  use  the  fraction  of 
his  preceding  new  and  renewal  applications  that  were  recommended  for  approval 
and  the  average  priority  score  received  on  these  applications.  The  priority  scores  are 
normalized  to  account  for  the  differences  among  study  sections.  Because  of  the 
statistical  simplicity  of  independent  samples,  I  used  only  the  last  new  application 
(defined  by  the  starting  date  of  the  budget  period)  received  from  each  investigator 
in  the  1972-1973  period.  Table  37  shows  these  indicators  of  quality  for  each  class  that 
includes  a  previous  history. 

The  last  class  in  the  table  consists  of  applicants  who  have  applied  both  to  the 
Institute  of  this  new  application  and  to  other  Institutes.  The  average  number  of 
preceding  applications  is  almost  four,  and  both  the  fraction  of  preceding  applications 
that  were  disapproved  and  the  average  priority  score  on  approved  applications  are 
much  worse  for  this  group.  This  group  must  contain  many  people  whose  previous 
applications  were  not  funded  and  who  changed  their  research  plans  in  the  hope  of 
being  funded. 

A  comparison  of  the  remaining  two  groups  indicates  they  have  about  the  same 
average  number  of  preceding  applications.  The  average  priority  score  received  on 
earlier  applications  is  better  (significant  at  the  0.01  level)  for  the  group  that  is 
applying  to  this  Institute  for  the  first  time  but  has  applied  to  other  Institutes  in  the 
past.  The  fraction  of  preceding  applications  that  were  recommended  for  disapproval 
is  also  smaller  for  this  group,  although  this  difference  is  not  statistically  significant. 


58 


Table  37 

INDICATORS  OF  QUALITY  OF  RESEARCHERS  BY  SHORT-TERM  RESEARCH  HISTORIES 


Preceding  History 

Class 

Ind Ividuals 

Application 

Application/ 

Individual 

Fraction 

Disapproved 

Average 
Normalized 
Priority  Score 

Applied  to  only  same 
Institute 

1300 

2503 

1.93 

0.342 

0.274 

Applied  to  only  another 
Institute 

729 

1554 

2.13 

0.324 

0.197 

Applied  to  both  same  and 
another  Institute 

500 

1976 

3.95 

0.398 

0.423 

F  =  22.7 

F  =  25.03 

Table  38  shows  the  judgments  of  the  study  sections  concerning  these  new  appli¬ 
cations'*  for  each  class.  The  "no  history”  class  gets  significantly  better  average 
priority  scores  and  contributes  a  larger  percentage  of  its  applicants  than  any  of  the 
other  classes  to  the  set  of  those  who  received  priority  scores  better  than  one  standard 
deviation  from  the  mean.  The  group  that  has  previously  applied  both  to  this  Insti¬ 
tute  and  to  others  receives  the  worst  average  judgments,  but  still  10.9  percent  of 
these  applicants  received  scores  better  than  one  standard  deviation  above  the  mean. 
Those  who  moved  to  a  new  Institute  for  the  first  time  had  slightly  more  of  their 
applications  disapproved  and  an  average  priority  score  indistinguishable  from  the 
remaining  group.  Since  14.2  percent  of  these  applications  were  worthy  of  very  good 
scores  and  the  entire  class  had  a  better  than  average  history,  one  must  conclude  that 
some  very  good  researchers  apply  for  research  projects  of  interest  to  more  than  one 
Institute. 


Table  38 

INDICATORS  OF  QUALITY  OF  NEW  APPLICATIONS  BY  SHORT-TERM  HISTORY 
OF  PRINCIPAL  INVESTIGATORS 


Class 

Number  of 
Applications 

Fract ion 
Disapproved® 

Average 
Normalized 
Priority  Score® 

Percentage  of 
Scores  >  -1® 

Applied  to  only  same 
Institute 

1275 

0.293 

0.067 

12.0 

Applied  to  only  another 
Institute 

729 

0.354 

0.064 

14.2 

Applied  to  both  same  and 
another  Institute 

490 

0.436 

0.123 

10.9 

No  history 

1809 

0.365 

-0.088 

17.0 

33  F.  =  6.6  X 
3  00  ^ 

^  =  13.76 

^Differences  between  classes  are  significant  at  the  0.05  level. 


The  numbers  per  class  are  slightly  smaller  than  those  in  Table  37,  because  applications  that  were 
deferred  or  received  internal  review  have  been  removed. 


59 


The  availability  of  funds  may  not  be  the  primary  motivation  in  switching  re¬ 
search  interests.  Table  39  shows  the  percentage  of  applications  assigned  to  each 
Institute  in  each  category.  As  is  to  be  expected  from  the  earlier  data,  the  largest 
percentage  of  the  applications  assigned  to  a  different  Institute  for  the  first  time 
during  this  period  go  to  NCI.  However,  over  75  percent  of  these  applications  are 
assigned  to  the  other  Institutes  in  approximately  the  same  proportion  as  all  appli¬ 
cations.  For  these  applications,  the  availability  of  funds  would  not  be  any  better  at 
the  new  Institute  than  at  the  Institute  to  which  they  had  previously  applied,  and 
the  availability  of  funds  is  not  likely  to  have  been  an  important  motivating  force. 
However,  since  for  many  of  those  who  applied  to  NCI  the  availability  of  funds  might 
have  been  important,  I  examine  similar  quality  measures  for  those  applying  to  NCI. 
The  data  are  shown  in  Table  40.  The  relationship  between  classes  is  quite  similar 
to  that  observed  in  the  entire  set  of  applications.  One  must  conclude  that  a  substan¬ 
tial  number  of  very  good  researchers  have  moved  toward  subjects  of  interest  to  the 
National  Cancer  Institute  during  this  period. 

Table  39 

PERCENTAGE  OF  NEW  APPLICATIONS  IN  SHORT-TERM  HISTORY  CLASSES 
ASSIGNED  TO  EACH  INSTITUTE 


Total 

Class 

NIAID 

NIAMD 

NCI 

NEI 

NIGMS 

NICHD 

NHLI 

NINDS 

Other 

Percent 

Application 

Applied  to  only  same 
Institute 

11.9 

18.8 

11.9 

2.3 

6.7 

13.3 

18.2 

15.3 

1.5 

100 

1300 

Applied  to  only  another 
Institute 

8.0 

12.6 

23.5 

2.2 

13.3 

13.6 

12.2 

7.3 

7.4 

100 

724 

Applied  to  both  same  and 
another  Institute 

10.8 

21.6 

16.2 

3.6 

10.0 

13.8 

15.0 

5.8 

3.2 

100 

500 

No  history 

8.2 

16.0 

15.4 

3.3 

10.9 

11.8 

17.3 

13.7 

3.3 

100 

1838 

Total 

Number  of  applications 

9.6 

417 

16.9 

737 

15.8 

690 

2i8 

125 

9.9 

434 

12.8 

558 

16.5 

719 

12.2 

532 

3.5 

151 

100 

4362 

2 

df  =  206.2  significance  level  <  0.0001 


EFFECT  OF  LEVEL  OF  FUNDING  ON  OUTPUT  MEASURES 

In  this  section  I  examine  the  quantity  and  quality  of  research  produced  in  1967 
that  would  have  been  produced  had  the  level  of  support  for  research  program 
projects  then  been  lower.  Publication  and  citation  data  should  not  be  interpreted  as 
a  cardinal  measure.  For  example,  although  one  may  believe  an  article  cited  30  times 
is  more  useful  than  one  cited  only  10  times,  there  is  no  justification  for  believing  it 
is  three  times  as  useful.  Therefore,  the  following  data  must  be  interpreted  with  care. 

The  preceding  part  of  this  section  on  the  peer  review  process  summarized  the 
evidence  that  the  ranking  of  applications  in  order  of  decreasing  priority  scores 
results  in  a  ranking  according  to  increasing  values  of  both  available  indicators  of 
usefulness  of  research,  citations  and  the  priority  score  assigned  to  a  subsequent 
renewal  application.  Since  within  each  program  of  an  Institute  the  funding  of  grants 
is  approximately  in  priority  score  order,  increasing  increments  of  funding  result  in 


60 


Table  40 

NEW  APPLICATIONS  ASSIGNED  TO  NCI 


Class 

Number  of 
Individuals 

Preceding  History  of  Principal  Investigators 

Application 

Application 
per  Individual 

Fraction 

Disapproved 

Average 
Normalized 
Priority  Score 

Applied  to  only  same 

Institute 

151 

309 

2.05 

0.447 

0.271 

Applied  to  only  another 

Institute 

171 

384 

2.25 

0.307 

0.244 

Applied  to  both  same  and 

another  Institute 

81 

315 

3.89 

0.368 

0.304 

F  -  13.26 

F  =  0.38 

Judgment  of  the  New  Application 


Number  of 
Applications 

Fraction 

Disapproved 

Average 
Normalized 
Priority  Score 

Percentage  of 

Scores  <  -1 

Applied  to  only  same 

Institute 

Applied  to  only  another 

149 

0.416 

0.117 

11.5 

Institute 

Applied  to  both  same  and 

171 

0.416 

0.082 

12.9 

another  Institute 

78 

0.372 

0.075 

8.2 

No  history 

280 

0.446 

-0.147 

F  =  2.0  (NS) 

19.3 

the  funding  of  research  grants  of  lower  expected  quality.  Because  of  the  uncertainty 
inherent  in  predictions  of  research  output,  there  is  much  variance  in  the  output  of 
grants  at  each  priority  score  level.  However,  the  statistical  tests  indicate  that  it  is 
very  unlikely  (less  than  one  chance  in  1000)  that  no  relationship  exists.  Therefore, 
if  one  assumes  that  each  research  project  represents  a  separate  unit  of  production, 
then  it  may  appear  there  are  decreasing  marginal  returns  to  increasing  expendi¬ 
tures.  Although  this  assumption  is  not  true,  as  is  discussed  more  fully  below,  it  still 
may  be  worth  examining  the  loss  represented  by  these  output  measures  at  different 
funding  levels. 

Assume  that  the  funding  of  grants  at  a  lower  total  level  of  expenditure  would 
be  in  priority  score  order,  and  consider  the  output  produced  by  the  grants  that  would 
be  funded  or  not  at  each  lower  level  of  expenditure.  The  second  row  of  Table  41  gives 
the  priority  score  pay  line — that  is,  I  assume  only  grants  with  priority  scores  at  least 
as  good  as  this  would  be  paid.  These  cutoffs  are  approximately  equal  to  90,  80,  60, 
and  40  percent  of  the  total  actual  1967  expenditures.  The  90  percent  cutoff  point  is 
probably  quite  close  to  the  pay  line  used  in  1967  in  most  Institutes.  As  explained  in 
Sec.  II,  it  appears  that  the  relationship  between  priority  score  and  quality  has 
remained  unchanged,  but  the  number  and  quality  of  the  applications  for  ROl  grants 
has  increased  since  1967;  therefore,  in  the  years  since  1967  one  would  have  to  fund 
more  grants  at  each  priority  score  level.  The  third  row  of  Table  42  shows  the  dollars 
required  to  fund  the  1973  applications  to  the  same  priority  score  level.  From  the  first 
column  one  can  see  that  to  fund  the  same  quality  mix  of  grants  in  1973  as  was  funded 
in  1967  would  have  required  an  additional  two-thirds  as  many  dollars.  In  1973 


61 


Table  41 

EFFECT  OF  LEVEL  OF  FUNDING  ON  OUTPUT  MEASURES 


Percent  of  actual  1967  fund¬ 
ing  level 

Priority  score  pay  level 
Total  dollars  awarded  in 

1967  ($  millions) 

1973  applications  with 
priority  score  equal 
or  better  than  pay 
line  ($  millions) 

90 

292 

57.1 

95.3 

80 

268 

50.9 

88.4 

60 

230 

38.1 

72.3 

40 

203 

25.5 

57.3 

Average  publications  per 

1967  grant 

With  priority  score 
better  or  equal 
to  pay  line 

6.7 

6.6 

7.2 

7.1 

With  priority  score 
worse  than  pay 
line 

5.8 

6.4 

5.8 

6.3 

Average  citation  measure 
Grants  with  priority 
score  better  or  equal 
to  pay  line 

13.8 

14.1 

15.8 

17.0 

Grants  with  priority 
score  worse  than 
pay  line 

10.0 

10.7 

10.2 

11.2 

Estimate  for  grants  at 
pay  line 

10.3 

11.2 

13.0 

14.2 

Number  of  grants  with 
citation  measure 
among  best  10 
percent 

Grants  with  priority 
score  better  or 
equal  to  pay  line 

43 

40 

36 

27 

Grants  with  priority 
score  worse  than 
pay  line 

3 

6 

10 

19 

Estimate  of  the  probability 
that  a  grant  at  pay  line 
would  be  among  the  best 

10  percent 

0.066 

0.076 

0.094 

0.109 

62 


Table  42 

REGRESSIONS  OF  CITATION  MEASURES  ON  PRIORITY  SCORE 


Ordinary  Least  Squares 

Logit  Equation 

Probability  Citation 
Measure  Exceeds 
Ninetieth  Percentile 

Citations  per 

Cited  Article 

Measure  of  fit 

Ff  =  15.82,  r2  =  0.03 

X2  =  4.6 

Significance  level 

0.001 

0.03 

Constant 

23.82 

-0.854 

Priority  score 

-0.04717 

-0.00613 

approximately  59.4  million  dollars  were  awarded  on  a  competition  basis  to  medical 
schools.  If  the  1967  grants  were  funded  only  until  the  average  priority  score  of  the 
funded  grants  was  the  same  as  in  1973,  then  only  the  top  40  percent  of  these  grants 
would  have  been  funded.  This  is  partly  because  of  an  expansion  of  demand  for 
research  from  expanded  medical  school  faculty  and  partly  because  of  the  backlog  of 
unsupported  research  that  has  accumulated  in  the  1967-1973  period. 

Two  output  measures  are  used:  total  publications  and  average  number  of  cita¬ 
tions  of  articles  cited  at  least  twice.  In  each  case  the  average  value  of  the  measure 
for  grants  in  the  basic  science  and  medical  groups  was  calculated  for  grants  that 
would  be  funded  at  each  pay  level.  There  appears  to  be  little  relationship  between 
funding  level  and  the  average  number  of  publications  per  grant.  The  average  value 
of  the  citation  measure  for  the  funded  grants  increases  from  13.8  to  17.0  as  the  pay 
line  goes  from  the  1967  level  to  approximately  the  1973  level.  The  estimated  value 
of  the  citation  measure  for  a  grant  with  a  priority  score  equal  to  the  pay  line  goes 
from  10.3  to  14.2. 

There  is  a  good  deal  of  variance  in  the  citation  measure  within  the  funded  and 
nonfunded  grants  at  each  pay  level.  Another  measure  of  what  is  lost  at  each  pay 
level  can  be  seen  by  an  examination  of  the  grants  with  a  citation  measure  within 
that  of  the  best  10  percent  of  all  the  grants.  Although  this  set  of  grants  is  more 
heavily  represented  in  the  funded  set  of  grants  at  each  pay  level,  the  probability  that 
a  grant  at  the  pay  line  would  be  in  this  class  remains  substantial  in  each  case.  At 
the  level  approximately  equal  to  the  1973  pay  level,  the  probability  that  a  grant  at 
the  margin  for  payment  would  be  in  this  set  exceeds  the  average  for  all  funded  1967 
grants.  Data  on  the  regressions  used  to  calculate  the  values  at  the  margin  were 
presented  in  Table  41. 


POSSIBILITIES  FOR  FUTURE  RESEARCH 

This  analysis  has  ignored  the  interrelationships  among  the  grants.  The  actual 
usefulness  of  each  grant  will  depend  on  the  funding  of  additional  research  in  related 
areas.  In  this  report  I  have  considered  only  the  quantity  of  citations  and  publications 


63 


from  a  grant.  In  further  analysis,  it  would  be  desirable  to  trace  the  breadth  of  the 
effects  of  these  research  grants  on  scientific  fields  as  revealed  in  citation  data,  which 
would  require  a  more  extensive  data  base.  In  particular,  it  would  be  desirable  to 
examine  second-generation  citations  (of  the  articles  that  cited  the  articles  produced 
under  these  grants).  An  analysis  of  the  effects  of  the  research  funded  in  1967  on 
grants  that  were  supported  subsequently  could  much  more  fully  address  the  concept 
of  decreasing  marginal  returns  to  increasing  expenditures.  For  a  subset  of  grants, 
the  citing  articles  could  be  examined  for  an  assessment  of  the  short-term  health-care 
effects  of  the  research.  The  analysis  could  also  consider  the  extent  of  the  inter¬ 
dependence  of  the  basic  research  programs  of  the  various  Institutes.  It  has  been 
suggested  that  progress  in  cancer  research  may  be  seriously  hindered  because  of  the 
lack  of  funds  for  basic  research  in  other  Institutes.  However,  the  existence  of  the 
hypothesized  interrelationships  has  received  verification  only  by  anecdote. 

The  analysis  of  second-generation  citations  would  also  be  directed  to  differences 
in  citation  patterns  by  scientific  sub-fields.  As  an  example,  one  could  attempt  to  place 
these  sub-fields  on  a  continuum  ranging  from  fields  where  research  results  are  cited 
solely  within  a  small  group  of  researchers  working  on  the  same  problem  to  fields 
where  the  research  results  are  cited  in  journals  with  widely  varying  interests.  The 
relationship  between  these  citing  patterns  and  the  level  of  support  received  by  the 
sub-fields  from  NIH  should  be  of  interest  to  those  concerned  with  the  distribution 
of  NIH  funds  both  within  and  between  Institutes. 


Appendix  A 

TIME  PATTERN  OF  CITATIONS 


Since  the  rate  at  which  citations  occur  tends  to  rise  until  two  to  four  years  have 
elapsed  following  publication  and  then  decline,  I  have  retrieved  a  segment  of  the 
total  citation  history  that  differs  by  the  year  in  which  an  article  was  published.  The 
primary  purpose  of  this  appendix  is  to  explain  the  methodology  used  to  adjust  the 
observed  number  of  citations  of  each  article  to  account  for  the  year  of  publication. 
In  the  course  of  my  developing  this  methodology,  much  information  concerning  the 
time  pattern  of  citations  was  generated.  This  information  is  presented  because  it 
may  be  useful  to  others  who  are  exploring  citations  as  a  measure  of  research  output. 

To  adjust  the  citations  to  each  article  retrieved  by  ISI  to  account  for  the  year 
of  publication,  I  use  the  available  data  to  estimate  the  number  of  citations  that  will 
occur  lor  have  occurred)  in  the  year  of  publication  and  the  six  following  years.  Define 
Z,  to  be  the  number  of  citations  to  an  article  occurring  i  years  after  publication. 
Then,  for  an  article  published  in  1966,  the  data  retrieved  by  ISI  gives  Z2,  Z3,  Z4,  Z5, 
Ze;  for  an  article  published  in  1968,  Zo,  Zi,  Z2,  Z3,  Z4.  For  each  article,  I  wish  to 
estimate  T,  the  total  number  of  citations,  where: 

6 

T  =  2  Z.  . 

i=0  * 

This  appendix  discusses  some  of  the  variables  that  affect  the  pattern  in  time 
during  which  citations  occur.  It  shows  that  citations  to  the  most  frequently  cited 
articles  occur  later  in  time  and  that  there  appear  to  be  only  small  differences  among 
the  time  patterns  of  citations  of  articles  in  different  scientific  fields.  The  discussion 
of  the  effect  of  type  of  publication  on  the  time  pattern  of  citations  depends  on  the 
model  used  to  estimate  T  and  therefore  is  postponed  until  the  model  is  introduced. 

The  model  specifies  an  equation  for  an  unknown  Zj  in  terms  of  a  known  set  of 
{Zj}.  The  predictive  ability  of  some  of  these  equations  has  been  evaluated.  The  last 
sections  of  this  appendix  demonstrate  the  magnitude  of  the  error  in  the  resulting 
predictions  of  T. 


ASSUMPTIONS 

I  shall  assume  in  what  follows  that  the  distribution  of  Z,  depends  only  on  the 
unknown  T  and  not  on  the  year  of  publication.  This  is  only  approximately  true 
because  the  set  of  journals  covered  by  ISI  changed  between  1968  and  1972,  but  the 
magnitude  of  this  effect  should  be  small.  ‘ 

The  time  of  the  year  during  which  an  article  was  published  will  affect  the 


‘  Between  1968  and  1972  the  number  of  journals  covered  by  ISI  increased  by  23  percent,  and  the 
average  number  of  citations  per  cited  item  increased  by  5.4  percent  (ISI,  1972,  p.  18).  This  may  mean  that 
the  estimates  of  the  number  of  citations  to  articles  published  in  the  early  years  of  the  sample  are  low. 


65 


66 


distribution  of  Z,.  However,  the  cost  of  obtaining  the  month  of  publication  of  all  of 
these  articles  is  prohibitive,  and  I  shall  assume  there  is  no  distinction  based  on  time 
of  publication  except  for  the  calendar  year.  This  will  add  only  random  errors  to  the 
calculations. 

Table  A-1  shows  the  number  of  citations  that  occurred  each  year  for  all  the 
journal  articles  in  the  sample.  The  articles  published  between  1966  and  1969  have 
approximately  the  same  time  pattern  of  citations.  The  articles  published  in  1970 
have  a  larger  proportion  of  citations  in  the  year  of  publication  and  too  few  citations 
in  the  second  year  after  publication.^  Much  of  this  difference  can  be  explained  by 
the  fact  that  most  of  the  1970  articles  were  published  before  June. 


Table  A-1 

TIME  PATTERN  OF  CITATION.S  BY  YEAR  OF  PllBl.ICATION 


Years  Since  Publication 


0 

1 

2 

3 

4 

5 

6 

Articles  published  in  1966 

Total  citations 

1591 

1480 

1379 

1313 

1135 

Most-cited  (percent) 

20.8 

20.3 

19.9 

22.1 

16.8 

Other  (percent) 

24.3 

22.1 

20.0 

17.4 

16.3 

Articles  published  in  1967 

Total  citations 

2044 

2657 

2474 

2502 

2164 

Most-cited  (percent) 

15.1 

20.4 

20.9 

23.0 

20.6 

Other  (percent) 

18.2 

23.3 

20.9 

20.  3 

17.2 

Articles  published  in  1968 

Total  citations 

418 

2118 

2856 

2904 

2551 

Most-cited  (percent) 

2.6 

17.2 

25.1 

28.9 

26.2 

Other  (percent) 

4.5 

20.8 

27.0 

25.6 

22.1 

Articles  published  in  1969 

Total  citations 

415 

2100 

2839 

2702 

Most-cited  (percent) 

4 . 6 

24.6 

34.6 

36.8 

Other  (percent) 

5.4 

27.1 

35.6 

31.9 

Articles  published  in  1970 

Total  citations 

198 

817 

84  3 

Most-cited  (percent) 

8.1 

39.8 

52.2 

Other  (percent) 

11.6 

45.5 

43.0 

TIME  PATTERN  OF  CITATIONS  OF  MOST-CITED  ARTICLES 

The  distribution  of  citations  over  time  depends  on  T  as  is  shown  in  Table  A-1, 
which  compares  the  time  pattern  of  citations  of  the  most-cited  5  percent  of  each 
year’s  articles  and  the  remaining  articles  published  the  same  year.  For  each  year 
the  effect  of  time  is  different  at  the  0.01  level  for  the  two  sets  of  articles.  The  highly 
cited  articles  receive  a  greater  proportion  of  citations  after  a  longer  time  has  passed. 


^  A  Chi-square  test  comparing  citations  that  occurred  in  years  0  through  2  after  publication  of  articles 
published  in  1968,  1969,  and  1970  yields  a  value  of  42.94  with  4  degrees  of  freedom  (significant  at  the 
0.001  level);  the  same  test  comparing  only  1968  and  1969  shows  no  difference  between  them. 


67 


TIME  PATTERN  OF  CITATIONS  BY  SCIENTIFIC  FIELDS 

I  next  consider  whether  the  time  from  publication  until  citation  is  different  for 
different  scientific  fields.  I  use  the  study  section  that  reviewed  the  grant  as  the 
definition  of  a  scientific  field  for  this  purpose  and  analyze  articles  from  the  four  study 
sections  that  have  the  highest  number  of  average  citations  per  article.  Each  of  these 
four  study  sections  is  treated  separately.^  For  each  article  published  under  a  grant 
from  that  study  section,  I  choose  at  random,  from  all  articles  published  under  other 
than  the  four  selected  study  sections,  an  article  published  in  the  same  year  with  the 
same  total  number  of  citations  or  as  close  to  this  number  as  possible.  This  sampling 
strategy  allows  me  to  control  for  T,  and  any  differences  in  citation  pattern  between 
the  two  sets  of  articles  should  be  due  to  the  scientific  field  of  the  study  section. 

For  each  study  section.  Table  A-2  shows  the  percentage  of  citations  that  occurred 
i  years  after  publication  for  its  articles  and  the  controls.  For  the  first  three  study 
sections  the  Chi-square  test  shows  no  difference  in  pattern.  For  study  section  4 
(PHRA)  the  value  is  significant  at  the  0.02  level.  However,  even  here  the  magni¬ 
tude  of  the  difference  in  the  fraction  of  total  citations  that  occur  each  year  is  very 
small.  Therefore,  I  conclude  that  the  distribution  of  Z,  given  T  does  not  depend  in 
an  important  way  on  scientific  fields. 


Table  A-2 


PERCENTAGE  DISTRIBUTION  OVER  TIME  OF  CITATIONS  BY  SCIENTIFIC  FIELDS 


Years  Af ter  Publ icat ion 

Number  of 

Citations 

6  df. 

0 

1 

2 

3 

4 

5 

6 

Study  section  1 

3.2 

18.7 

29.3 

24.3 

14.1 

7.9 

2.5 

2730 

8.9 

Control 

2.5 

16.6 

28.8 

25.9 

15.5 

8.0 

2.8 

2697 

Study  section  2 

2.8 

19.2 

26.0 

24.2 

14.3 

9.8 

3.8 

1196 

4.6 

Control 

2.4 

17.5 

28.9 

23.2 

15.6 

9.1 

3.3 

1139 

Study  section  3 

2.5 

15.9 

26.4 

23.0 

17.9 

9.2 

4.9 

1577 

5.5 

Control 

2.6 

17.7 

24.2 

24.0 

18.7 

7.8 

5.1 

1560 

Study  section  4 

1.9 

15.8 

27.3 

26.8 

16.0 

9.0 

3.2 

2999 

15.2 

Control 

2.6 

18.9 

26.8 

25.3 

14.7 

8.6 

3.1 

3001 

SPECIFICATION  OF  THE  MODEL 

The  next  question  to  be  addressed  is  the  specification  of  the  model  used  to 
estimate  T  given  a  subset  of  the  Zj.  After  some  preliminary  analysis  I  decided  to 
estimate  each  missing  Z,  separately  rather  than  estimate  the  sum  directly.  The 
missing  Zj  are  estimated  from  the  relationship  seen  between  the  missing  Z,  and  the 
available  Z;  in  a  year’s  data  where  both  are  available.  For  example,  for  an  article 


“  The  four  study  sections  are:  Allergy  and  Immunology  (ALY),  Biochemistry  (BIO),  Physiological 
Chemistry  (PC),  and  Pharmacology  A  (PHRA). 


68 


published  in  1966  an  estimate  of  Zi  is  found  from  the  1967  data  by  regressing  Zi  on 
functions  of  Z2,  Z3,  Z4,  and  Z5,  which  are  available  for  articles  published  in  both  1966 
and  1967.  The  coefficients  of  the  regression  on  1967  data  are  then  used  to  derive  an 
estimate  of  Zi  for  each  article  published  in  1966. 

After  preliminary  analysis,  the  following  form  of  the  regression  equation  was 
chosen: 


,  (A.l) 


where  k  is  the  year  closest  to  i  for  which  data  are  available,  and  the  summations 
extend  over  all  other  years  for  which  data  are  available.  By  singling  out  the  closest 
year  for  which  data  are  available,  this  form  allows  the  prediction  to  depend  not  just 
on  the  total  citations  seen  in  the  data  but  also  on  the  time  pattern  in  which  they 
occurred.  An  equation  that  allowed  each  available  Zj  to  enter  spearately  into  the 
equations  was  tried,  but  the  coefficients  for  all  years  but  the  last  were  statistically 
indistinguishable,  and  no  improvement  in  predictive  power  was  observed.  Higher 
order  polynomials  were  also  tried.  A  quartic  equation  appeared  to  give  slightly 
better  predictions,  but  because  it  was  unstable  near  the  high  end  of  the  range  of  the 
regression  data,  I  prefer  the  quadratic  form. 


VALIDATION  OF  INDIVIDUAL  EQUATIONS  OF  MODEL 

For  some  of  these  equations  it  is  possible  to  use  the  available  data  to  estimate 
the  worth  of  this  model  as  a  predictor  of  the  Zi  by  using  the  regression  equation  from 
one  year’s  data  to  predict  Zi  in  another  year  where  Zi  is  available.  I  present  the 
results  of  two  such  evaluations. 

The  first  case  is  the  number  of  citations  seen  in  the  third  year  after  publication 
estimated  from  citations  seen  in  years  0  through  2.  In  this  case  articles  from  1968 
are  used  to  estimate  the  regression  equation,  and  the  evaluation  is  on  the  articles 
published  in  1969.  The  second  case  is  the  citations  seen  five  years  after  publication 
estimated  from  citations  seen  in  years  2  and  3.  The  data  used  to  estimate  the 
regression  consist  of  articles  published  in  1966,  and  the  evaluation  is  on  articles 
published  in  1967. 

For  the  second  case  I  also  consider  an  alternative  model.  I  first  estimate  Zi,  the 
number  of  citations  seen  in  the  year  following  publication,  from  the  citations  seen 
in  years  two  through  four  in  the  1968  data.  I  then  estimate  Zi  for  each  1966  article 
(number  of  citations  that  would  have  been  seen  in  1967)  and  include  this  in  the 
regression  estimate  of  citations  seen  five  years  after  publication.  This  may  improve 
the  prediction  for  the  1967  data  where  actual  data  on  citations  one  year  after 
publication  are  available. 

The  regression  equation  for  the  first  case  is: 


2 


2 


For  the  second  case  there  are  two  alternative  specifications: 


69 


Estimate  1: 


2  2 

Zg  =  aQ  +  aiZg  +  ^3^2  ^4^2  • 

Estimate  2: 

Z^  —  Sq  ’*’^1^2  ^2^2  '*’  ^4  )  ’*’  ^4(^3  '*’  ^4) 

Z5  —  aQ  +  a^^Zg  +  a2 Zg  +  a2(Z2  +  Z^ )  +  a^(Z2  +  Z^ )  , 

where  Zi  is  the  estimate  of  Zi  for  an  article  published  in  1966. 

The  evaluation  of  these  predictions  is  shown  in  Table  A-3.  In  all  cases  the 
standard  error  of  the  prediction  is  quite  small  and  close  to  the  standard  error  of  the 
regression  equation.  This  indicates  that  the  time  lag  structure  is  similar  from  year 
to  year  and  that  enough  data  are  available  in  our  sample  from  a  single  year  to 
estimate  the  time  pattern.  Estimate  2  performs  slightly  worse  than  Estimate  1  in 
predicting  Z5.  Because  of  this  test,  and  because  it  is  computationally  easier,  no 
estimates  are  used  anywhere  in  the  regression  equation. 


Table  A-3 

EVALUATION  OF  THE  ERROR  IN  PREDICTION 
OF  INDIVIDUAL  EQUATIONS 


Estimate 

Estimate 

Model  1 

Mode] 

2 

Years  between  publication 

and  citation 

3 

5 

1 

5 

Data  used  for  fit 

68 

66 

68 

66 

Number  of  data  points 

1264 

624 

1265 

624 

Mean 

2.30 

2.10 

1.68 

2.10 

Standard  deviation 

4.63 

4.39 

2.95 

4.39 

Standard  error 

2.37 

2.34 

1.70 

2.30 

r2 

0.74 

0.72 

0.67 

0.73 

Data  used  for  prediction 

69 

67 

66 

67 

Number  of  data  points 

1065 

1143 

664 

1143 

Mean 

2.54 

1.89 

_ a 

1.89 

Standard  deviation 

4.94 

3.73 

— 

7.73 

Bias 

-0.16 

-0.04 

—  _ 

0.09 

Standard  error 

2.35 

2.08 

— 

2.19 

Fraction  of  variance 

explained 

0.78 

0.69 

— 

0.65 

^Unknown. 


70 


TIME  PATTERN  OF  CITATIONS  BY  TYPE  OF  PUBLICATION 

Another  question  to  be  answered  is  whether  the  time  between  publication  and 
citation  differs  for  different  types  of  publications.  The  publications  in  the  sample  are 
classified  into  journal  articles,  books,  theses,  and  talks  at  professional  meetings  and 
published  abstracts.  Table  A-4  shows  the  distribution  of  citations  by  year  of  publica¬ 
tion  and  citation  for  each  class  of  publication.  There  are  not  enough  data  to  come 
to  any  conclusion  about  the  time  pattern  of  citations  of  theses.  The  remaining  three 
categories  appear  to  have  different  citation  patterns. Citations  of  books  tend  to 
occur  later  than  citations  of  journal  articles,  and  citations  of  talks  and  abstracts  tend 
to  occur  sooner  than  citations  of  journal  articles. 


TabJe  A-4 

TIME  PATTERN  OF  CITATIONS  BY  TYPE  OF  PUBLICATION 


Years  between  Publication  and  Citation 


0 

1 

2 

3 

4 

5 

6 

1966  Publications 

Journal  articles 

1591 

1480 

1379 

1313 

1135 

Books 

9 

18 

21 

23 

17 

Talks  and  abstracts 

24 

15 

22 

5 

14 

Theses 

0 

0 

0 

1 

1 

1967  Publications 

Journal  articles 

2044 

2657 

2474 

2502 

2164 

Books 

27 

31 

37 

40 

30 

talks  and  abstracts 

80 

52 

38 

31 

19 

Theses 

2 

0 

1 

0 

1 

1968  Publications 

Journal  articles 

418 

2118 

2856 

2904 

2551 

Books 

25 

66 

79 

103 

103 

Talks  and  abstracts 

24 

83 

72 

51 

50 

Theses 

0 

0 

0 

1 

2 

1969  Publications 

Journal  articles 

415 

2110 

2839 

2702 

Books 

3 

30 

53 

65 

Talks  and  abstracts 

21 

77 

38 

33 

theses 

0 

1 

5 

4 

1970  Publications 

Journal  articles 

198 

817 

843 

Books 

9 

43 

71 

Talks  and  abstracts 

5 

14 

8 

Theses 

0 

0 

0 

This  implies  that  separate  regression  equations  should  be  estimated  for  Z;  for 
each  type  of  publication.  Before  accepting  this  decision  I  evaluated  the  ability  of 
separate  equations  to  predict  citations.  Using  1966  publications  I  regressed  Z5  sepa- 

For  each  year’s  publication  Chi-square  tests  showed  a  significantly  different  pattern  of  time  for  the 
three  kinds  of  publications  at  at  least  the  0.03  level. 


71 


rately  for  journal  articles,  books,  and  talks  and  published  abstracts.  In  Table  A-5  the 
ability  to  predict  Z5  for  1967  publications  is  shown.  For  both  books,  and  talks  and 
abstracts,  the  value  of  Z5  is  better  predicted  from  citations  of  journal  articles  than 
from  citations  of  publications  of  the  same  type.  The  large  variance  in  the  estimate 
of  the  coefficients  of  the  regression  equations  due  to  the  small  number  of  citations 
of  books  and  talks  outweighs  the  unknown  bias  in  the  estimated  coefficients  of  the 
regression  equation  from  journal  articles.  Therefore,  I  shall  not  model  the  time 
pattern  of  citations  separately  for  each  publication  type. 


Table  A-5 

EVALUATION  OF  SEPARATE  REGRESSIONS 
FOR  EACH  TYPE  OF  PUBLICATION 


Prediction  for: 

Books 

Books 

Talks  and 
Abstracts 

Talks  and 
Abstracts 

Prediction  from: 

Journal 

Articles 

Books 

Journal 

Articles 

Talks  and 
Abstracts 

Mean 

0.50 

0.50 

0.07 

0.07 

Bias 

Standard  error  or 

-0.05 

-0.22 

-0.10 

0.03 

pred ict ion 

0.93 

1.31 

0.31 

0.33 

FINAL  ESTIMATES 

The  final  regression  equations  are  based  on  all  the  publications  in  the  sample 
and  use  equation  (A.l)  of  this  section.  These  equations  are  shown  in  Table  A-6. 


VARIANCE  IN  THE  ESTIMATE  OF  T:  METHODOLOGY 

The  next  question  to  be  addressed  is  the  variance  in  the  estimate  of  T,  the  total 
number  of  citations  occurring  in  the  six  years  following  publication,  as  a  function 
of  the  year  of  publication.  These  estimates  are  necessary  before  one  can  decide  how 
useful  the  values  of  T  can  be  as  the  measure  of  research  output.  However,  they  also 
should  be  of  interest  to  those  who  are  designing  further  experiments  using  citations. 
If  one  assumes  that  six  years  of  citation  data  are  enough  to  evaluate  the  short  term 
effects  of  research,  then  these  estimates  of  the  error  in  T  will  show  the  loss  incurred 
in  obtaining  fewer  than  six  years  of  data. 

The  variance  of  the  estimate  of  T  is  derived  as  follows.  Denote  E,(i  —  k)  to  be 
the  error  in  an  estimate  of  Z,  from  Z,_k,  Z,_k_i, . . .,  Zo,  and  Vi(i  —  k)  to  be  the  variance 
of  Ei(i  —  k).  The  previous  section  has  shown  that  the  standard  error  of  prediction 
of  individual  equations  is  very  close  to  the  standard  error  of  the  estimate  from  the 
regression  equation.  Therefore  I  assume  that  the  variance  of  the  estimate  from  the 


72 


Table  A-6 


REGRESSION  EQUATIONS  FOR  CITATIONS  PER  YEAR 


Dependent  variable 

Year  for  which  estimate 

^0 

"0 

"1 

"3 

is  to  be  used 

Year  from  which  estimate 

66 

67 

66 

70 

70 

69 

is  derived 

68 

68 

67 

69 

68 

68 

Lag  k 

2 

1 

2 

2 

2 

3 

Summation  extends  over  lags 

3-4 

2-4 

3-5 

0-1 

0-1 

0-2 

r2 

0.19 

0.26 

0.60 

0.79 

0.62 

0.73 

Standard  error  of  estimate 

0.69 

0.66 

1.68 

2.03 

2.28 

1.93 

ao 

0.079 

0.056 

0.329 

0.050 

-0.033 

-0.055 

0.071^ 

0.142a 

0.130a 

0.550a 

0.585a 

0.516a 

a' 

0.0001 

-0.0000 

0.014a 

O.Olia 

0.004a 

0.002a 

"3 

0.029a 

0.0095a 

0.175a 

0. 390a 

0.465a 

0.246a 

-0.0005^ 

-0.0001 

-0.0016a 

-O.Olia 

-O.OlSa 

-0.003a 

Dependent  variable 

Year  for  which  estimate 

"5 

^6 

^6 

^6 

^6 

is  to  be  used 

Year  from  which  estimate 

68 

69 

70 

67 

68 

69 

70 

is  derived 

67 

67 

67 

67 

66 

66 

66 

Lag  k 

4 

3 

2 

5 

4 

3 

2 

Summation  extends  over  lags 

1-3 

1-2 

1 

2-4 

2-3 

2 

none 

r2 

0.79 

0.71 

0.64 

0.78 

0.76 

0.69 

0.66 

Standard  error  of  estimate 

1.54 

1.83 

2.04 

1.53 

1.63 

1.85 

1.93 

^0 

0.004 

0.040 

0.124 

0.032 

0.096 

0.140 

0.263 

32 

0.255a 

0.380a 

0.365a 

0.472a 

0.335a 

0.315a 

0.547a 

0.011^ 

0.013a 

0.020a 

-0.003b 

0.008a 

-0.004a 

0.263® 

^4 

0.183^ 

0.194^ 

0.370^ 

0.121^ 

0.182® 

0.335® 

-0.0004^ 

-0.0003 

-0.015a 

0.000 

-O.OOia 

0.007® 

^Significant  at  the  0.01  level. 
^Significant  at  the  0.05  level. 


regression  equation  is  a  reasonable  estimate  of  Vi(i  —  k)  and  estimate  the  covariance 
of  Ei(i  —  k)  and  E,+j(i  —  k)  from  a  linear  model  that  assumes 


Zi_i  =  aj_^  j_2Zj_2  +  aj_^  j_3Zj_3  .  .  .  + 


^i-2  ^i-2,i-3^i-3  •  •  •  ^i-2 


etc., 

where  the  €j  are  assumed  to  be  N(0,a^)  and  Cov  {ei,€j)  =  0  for  i  ^  j.  For  example,  Ei(i 
—  2)  =  ai,i_i€i_i  +  €i,  and  the  covariance  of  Ei(i  —  2)  and  E,_i(i  —  2)  is  ai,i_iOi_i;  and 
therefore  the  variance  in  the  estimate  of  Zj  +  Zi_i  from  Zi_2,  . . .,  Zo  is  given  by: 

Vj(i-2)  +  (2aj  j_i  +  1)  Vi_i(i-2)  . 

In  a  similar  fashion  one  can  determine  Vi(i  —  3)  and  so  on.  The  equations 
required  to  estimate  the  standard  error  associated  with  each  year’s  estimate  of  T  are 
displayed  in  Table  A-7.  The  coefficients  aij  used  to  produce  the  data  in  Table  24  were 
estimated  from  the  above  linear  model.  The  size  of  the  error  in  the  estimate  of  T  is 
discussed  in  Section  V  of  this  report. 


Table  A-7 


SPECIFICATION  OF  VARIANCE  IN  ESTIMATE  OF  T 


Year 

Estimation  of 

Specification  of  Variance  of 

Estimate  of  Sum 

1966 

^0  +  23 

V66  =  V3(2)  +  (2  aQ3+l)  Vq(2) 

1967 

20  +  ^6 

Vg^  =  Vg(5)  +  Vg(l)  (since  ag^  Is 

assumed  to  be  0) 

1968 

26+^5 

^68  =  ^6^^)  +  (2  ag3+l)  V3(4) 

1969 

Z6  +  Z5  +  Z4 

Cov(E^(3),  £3(3))  =  a3^  V^(3) 

1970 


z.  +  +  z,  +  z„ 

6  5  4  3 


Cov(E^(3),  E^(3))  =  Cov(E^(3),  E^(3)) 

+  -64  ^(3) 

Cov(Eg(3),  E^O))  =  Cov(E^(3),  E^(3)) 


4-  a^5  V3(3) 


V39  =  V3(3)  +  V3(3)  +  V^(3) 


6  6 

+  2  Z  Z  Cov(E.j:3)  ,  E.  (3)) 

1=4  j=4  ^  ^ 

in 

Cov(E^(2),  E3(2))  =  a^3  •  V3(2) 

Cov(E3(2),  E3(2))  =  ag^  Cov(E^(2),  £3(2)) 

+  a33  V3(2) 

Cov(E  (2),  E  (2))  =  a  Cov(E  (2)  ,  E  (2)) 


+  ag^  Cov(E^(2)  ,  £3(2)) 

+  3^3  V3(2) 


Cov(E3(2),  E^(2))  =  a3^  V^(2)  +  a 


53 


•  Cov(E^(2),  £3(2)) 

Cov(E3(2),  E^(2))  =  a^^  Cov(E3(2),  E^(2)) 


+  -64  •  ^(2) 


+  a33  Cov(E^(2)  ,  £3(2)) 
Cov(E3(2),  £3(2))  =  a33  V^<.2) 

+  ag^  Cov(E^(2)  ,  E3(2)) 


+  ag3  •  Cov(E3(2),E3(2)) 


V,o  =  ^  V^(2) 


6  6 

+  2  Z  Z  Cov(E.(2) ,  E.(2)) 
1=3  j=3  ^ 


I 


Appendix  B 

THE  GROUPING  OF  GRANTS  BY  SCIENTIFIC  FIELD 


In  a  scientific  field  that  contains  only  a  small  number  of  researchers,  the  number 
of  citations  per  article  may  be  smaller  than  in  a  large  field.  Therefore,  the  grants 
should  be  grouped  into  large  sets  such  that  each  contains  all  the  grants  in  scientific 
fields  of  approximately  the  same  size. 

As  a  starting  point  I  used  the  principal  investigator’s  department  in  the  medical 
school  as  a  surrogate  for  scientific  field.  The  primary  source  of  information  is  the 
department  field  on  the  IMPAC  file.  In  cases  where  this  field  did  not  contain  the 
name  of  a  department  of  a  medical  school  (e.g.,  it  occasionally  contained  "none,” 
"research,”  or  names  of  research  institutes),  school  catalogs  or  biographical  sources 
were  consulted  to  obtain  the  department  to  which  the  principal  investigator  be¬ 
longed  in  1967. 

The  differing  structures  of  medical  schools  required  some  decisions  on  grouping 
the  actual  departments  of  the  schools.  The  medical  subspecialties  of  cardiology, 
dermatology,  hematology,  and  neurology  were  combined  with  medicine.  All  the 
surgical  subspecialties  were  combined  with  surgery.  Departments  of  biochemistry 
and  biophysics  were  combined  into  a  single  category.  Departments  labeled  "Physi¬ 
ology  and  Biophysics”  or  "Physiology  and  Pharmacology”  were  placed  in  the  physi¬ 
ology  group.  The  public  health  category  is  a  combination  of  departments  of  environ¬ 
mental  medicine,  international  health,  preventive  medicine,  and  public  health. 
Physical  and  rehabilitation  medicine  includes  departments  of  audiology  and  of 
bioengineering. 

For  25  of  the  798  grants,  the  IMPAC  file  listed  two  distinct  departments.  These 
grants  were  assigned  to  the  first  of  the  two  departments. 

Table  B-1  shows  the  average  number  of  citations  per  article  for  each  medical 
school  department.  Within  the  basic  science  departments  it  is  clear  that  biochemis¬ 
try,  microbiology,  and  pharmacology  can  be  grouped  together  and  that  anatomy  is 
quite  different.  Physiology,  which  lies  about  halfway  between  anatomy  and  the  other 
basic  sciences  in  citation  rate,  presents  a  problem.  Here  it  is  combined  with  the  basic 
science  departments  other  than  anatomy. 

Within  the  clinical  sciences  the  small  fields  of  forensic  medicine,  physical  medi¬ 
cine,  public  health,  and  radiology  all  have  very  small  citation  rates  and  will  be 
treated  separately.  Psychiatry  is  also  placed  with  this  group.  The  two  largest  depart¬ 
ments,  surgery  and  medicine,  are  different  from  each  other  and  form  the  nuclei  for 
two  additional  groups.  On  the  basis  of  citation  rates  I  assign  pathology  to  the  medical 
group.  Results  from  the  grants  in  the  OB-GYN  departments  were  published 
predominantly  in  journals  devoted  to  endocrinology.  The  most  frequently  chosen 
journal  was  the  American  Journal  of  Obstetrics  and  Gynecology,  but  the  second, 
third,  and  fourth  choices  were  Endocrinology,  Journal  of  Clinical  Endocrinology, 
and  Steroids.  Therefore,  the  grants  from  the  OB-GYN  department  were  placed  in 
the  medical  group.  In  the  belief  that  research  in  pediatrics  will  bear  more  resem- 


74 


75 


blance  to  research  in  medicine  than  to  research  in  surgery,  these  grants  were  placed 
in  the  medical  group.  The  grants  from  anesthesiology  were  placed  in  the  surgical 
group.  The  definitions  of  groups  used  here  are  summarized  in  Table  25  in  the  text. 


Table  B-1 

CITATION  RATES  BY  MEDICAL  SCHOOL  DEPARTMENTS 


Citations 

per  Article 

Department 

Grants 

Articles 

Average 

Standard 

Deviation 

Anatomy 

51 

202 

7.62 

13.29 

Biochemistry  and 
biophysics 

75 

262 

16.04 

22.35 

Microbiology  and 
immunology 

52 

157 

18.46 

27.31 

Pharmacology 

43 

261 

18.05 

28.11 

Physiology 

70 

271 

12.98 

18.60 

Pathology 

66 

372 

14.19 

23.28 

Forensic  medicine 

1 

32 

3.08 

3.52 

Medicine 

204 

1554 

14.10 

23.72 

Obstetrics  and 
gynecology 

23 

63 

14.95 

16.35 

Pediatrics 

40 

260 

10.51 

14.66 

Physical  and  rehabili¬ 
tation  medicine 

6 

13 

4.63 

4.21 

Psychiatry 

12 

64 

2.73 

5.25 

Public  health 

10 

53 

6.67 

9.76 

Radiology 

21 

36 

6.28 

8.13 

Surgery 

120 

544 

8.36 

16.67 

Anesthesiology 

4 

26 

10.42 

12.30 

Appendix  C 

MAJOR  NIH  POLICY  ISSUES 


The  issues  of  the  level  of  biomedical  support  and  the  type  of  funding  mechanism 
have  been  important  policy  issues  since  the  foundation  of  NIH,  and  remain  so  today. 
This  appendix  briefly  reviews  the  history  of  NIH  with  respect  to  these  issues,  iden- 
tifles  the  people  who  have  been  and  are  involved  in  setting  NIH  policy,  and  specifies 
the  considerations  on  which  the  policy  decisions  have  been  made. 


FUNDING  LEVELS,  1947-1967 

The  20  years  following  the  close  of  World  War  II  were  years  of  very  rapid  growth 
in  the  level  of  support  of  biomedical  research  by  NIH.  Between  1947  and  1955,  the 
NIH  budget  grew  tenfold,  reaching  $81  million;^  it  grew  by  another  factor  of  10  to 
$880  million  in  1963;  by  1967  it  had  reached  $1.3  million. 

During  these  years,  increases  in  the  NIH  budget  were  driven  by  Congress  rather 
than  by  the  executive  branch.  With  but  a  few  important  exceptions,  the  budget 
presented  by  the  President  to  Congress  was  derived  through  an  incremental  in¬ 
crease  over  the  budget  for  the  previous  year,  but  Congress  actually  appropriated 
much  larger  increases.  Between  1950  and  1967,  Congressional  appropriations  ex¬ 
ceeded  the  Presidential  requests  in  every  year  except  1951,  1952,  and  1964,  and  in 
only  1951  did  the  NIH  budget  actually  decline. 

The  men  in  Congress  most  associated  with  the  rapid  rise  in  NIH  funding  were 
John  Fogarty  of  the  Subcommittee  for  Labor  and  HEW  of  the  Committee  on  Appro¬ 
priations  of  the  House;  and  Lister  Hill,  who  had  the  corresponding  position  in  the 
Senate.  Both  men  were  Democrats  and  chaired  these  committees  during  the  years 
their  party  controlled  Congress.  Support  for  NIH  budget  increases  was  clearly  bipar¬ 
tisan,  however,  as  shown  by  the  support  each  received  from  ranking  Republicans  on 
his  subcommittees.  In  addition,  because  of  the  permanent  authorization  of  unlimited 
funds  for  research  received  by  the  Public  Health  Service  in  the  Public  Health 
Service  Act  of  1944,  efforts  of  the  subcommittees  to  raise  NIH  spending  were  not 
hindered  by  the  existence  of  other  committees  normally  required  to  authorize  funds. 

Each  year  extensive  subcommittee  hearings  were  held  to  establish  the  evidence 
required  to  convince  the  parent  committee  and  legislative  body  of  the  advisability 
of  raising  NIH  appropriations.  NIH  administrators,  nongovernment  medical  scien¬ 
tists,  and  lay  people  testified.  These  nongovernment  witnesses  constituted  an  effec¬ 
tive  lobbying  group,  which  demonstrated  (and  stirred  up)  public  opinion  in  favor  of 
biomedical  research.  One  of  the  leaders  of  this  lobby  was  and  is  Mary  Lasker.  The 
testimony  at  these  public  hearings  contained  certain  predictable  elements  still  to  be 
found  in  the  annual  hearings. 

One  element  of  the  case  each  year  was  the  need  for  biomedical  research  as  shown 
by  the  devastating  toll  illness  and  death  took  on  the  nation;  this  was  often  demon- 


'  This  budget  figure  and  succeeding  ones  are  from  Hearings,  1972,  p.  64. 


76 


77 


strated  by  individual  case  histories.  Statistics  on  deaths,  disabilities,  and  income 
losses  attributable  to  diseases  were  presented.  In  early  years  these  statistics  had 
often  been  compiled  by  the  National  Health  Education  Committee,  Inc.,  financed  by 
the  Lasker  Foundation. 

Another  crucial  element  in  the  argument  was  that  progress  toward  meeting  the 
need  for  biomedical  research  was  possible;  outstanding  progress  in  the  previous  year 
was  recited.  Often  hope  was  held  out  for  progress  in  particular  areas  in  the  coming 
year. 

The  care  exercised  by  NIH  in  the  administration  of  its  research  program  was 
also  pointed  out.  In  some  years  NIH  returned  small  sums  of  its  appropriations 
because  there  were  not  enough  scientifically  valid  projects  to  support.  However,  the 
training  programs  of  NIH  were  rapidly  increasing  the  numbers  of  biomedical  scien¬ 
tists  who  were  capable  of  performing  research,  and  in  most  years  NIH  could  point 
to  a  backlog  of  scientifically  meritorious  applications  that  could  not  be  funded.  Thus 
NIH  demonstrated  that  there  were  sufficient  trained  scientists  with  good  ideas  to 
use  expanded  funds.  None  of  the  funds  were  wasted;  they  were  being  used  to  build 
the  base  of  knowledge  required  to  solve  the  nation’s  health  problems. 

In  addition  to  NIH  testimony,  special  committees  were  occasionally  appointed 
to  examine  whether  NIH  funds  were  being  used  efficiently.  A  Committee  of  Consult¬ 
ants  (also  known  as  the  Jones  Committee)  was  appointed  in  1960;  it  included  many 
people  who  were  already  prominent  in  the  biomedical  research  lobby^  and  who,  as 
Strickland  (1972)  put  it,  were  "highly  skilled  practitioners  of  partisan  analysis. . . 
[invited] ...  to  perform  that  analysis  for  a  group  of  clients,  most  of  whom  hoped  the 
analysis  would  be  partisan,  and  the  rest  of  whom  would  be  satisfied  just  because  it 
was  analysis.”  The  resulting  recommendations  for  specific  increases  in  NIH  spend¬ 
ing  were  not  surprising. 

Early  opponents  of  the  rate  of  increase  in  NIH  funding  based  their  objections 
mostly  on  their  belief  that  no  enterprise  could  grow  at  the  rate  NIH  was  growing 
and  still  function  efficiently.  But  their  objections  were  made  without  the  specific 
details  required  to  meet  the  case  presented  yearly  by  the  proponents  of  the  NIH 
budget  increases.  Representative  Lawrence  H.  Fountain,  chairman  of  the  Intergov¬ 
ernmental  Relations  Subcommittee  of  the  House  Government  Operations  Commit¬ 
tee,  pointed  to  one  way  of  substantiating  the  case  for  slower  growth.  The  Fountain 
Committee  report  of  June  1962  stated:  "The  pressure  for  spending  increasingly 
large  appropriations  has  kept  NIH  from  giving  adequate  attention  to  basic  manage¬ 
ment  problems.”  Two  immediate  short  term  effects  can  be  attributed,  at  least  in 
part,  to  this  first  Fountain  Committee  report.  In  1963,  for  the  first  time  in  a  decade. 
Congress  appropriated  less  money  for  NIH  than  the  President  had  requested;  in 
addition.  President  Kennedy  called  for  a  review  of  all  NIH  activities.  Because  of  his 
assassination,  this  study  was  not  actually  begun  until  1964  when  President  Johnson 
appointed  the  Woolridge  Committee,  which  convened  panels  of  scientific  experts  to 
make  first-hand  evaluations  of  NIH  research  grants,  program  project  grants  and 
contracts.  Although  the  Committee’s  report  (the  White  House,  1965)  did  have  some 
suggestions  concerning  the  structure  of  NIH  decisionmaking,  its  principal  finding 
was  that  the  NIH  budget  was  "being  spent  wisely  and  well  in  the  public  interest.” 
Reassured  by  this  report.  Congress  again  increased  NIH  funding  for  FY  1966  and 
FY  1967. 


^  For  example,  Dr.  Sidney  Farber  and  Dr.  Michael  de  Bakey. 


78 


FUNDING  LEVELS,  1968  TO  THE  PRESENT 

Appropriations  for  NIH  research  activities  declined  in  FY  1968  and  again  in  FY 
1969,  and  rose  only  imperceptibly  in  FY  1970.  This  reversal  of  NIH’s  fortunes  had 
many  causes,  among  them  the  loss  of  past  leaders:  John  Fogarty  died  in  January 
1967;  Lister  Hill  chose  to  retire  rather  than  run  for  reelection  in  1968;  and  Dr.  James 
A.  Shannon,  who  had  been  Director  of  NIH  since  1955,  also  retired  in  1968.  There 
was  a  growing  skepticism  about  the  value  of  all  research  and  development  programs, 
including  those  of  NASA  and  DOD,  as  well  as  of  biomedical  programs. 

Much  debate  in  this  period  focused  on  whether  Congress  had  "force-fed”  NIH 
to  the  point  where  money  had  been  wasted.  Congressman  Fountain’s  Committee’s 
November  1967  report  was  highly  critical  of  NIH  management  and  charged  that 
many  projects  being  supported  were  of  poor  quality.  In  hearings  on  the  FY  1968 
budget.  Senator  Hill  added  $4  million  to  the  budget  for  a  particular  drug  study 
without  first  discussing  the  study  with  NIH  personnel.  Unfortunately,  a  similar 
study  was  already  under  way,  and  the  incident  received  wide  publicity. 

To  some  extent  the  slowdown  reflected  newly  emerging  national  priorities.  The 
analog  of  Hitch  and  McKean’s  question,  "How  much  is  needed  for  biomedical  re¬ 
search  more  than  it  is  needed  for  other  purposes?”  was  being  considered  for  the  first 
time  since  the  Korean  War,  and  the  other  purposes  loomed  large — the  Vietnam 
War,  direct  support  of  education  in  the  health  professions,  and  the  Great  Society 
programs,  especially  Medicare  and  Medicaid.  In  addition  to  the  demands  these  new 
health  care  programs  placed  on  national  resources,  they  gave  members  of  Congress 
an  important  political  option:  to  be  for  health  programs  and  still  against  expansion 
of  biomedical  research. 

The  budgets  that  President  Nixon  presented  to  Congress  for  NIH  were  modest, 
in  line  with  his  general  policy  of  fiscal  conservatism.  For  1970  and  1971  they  were 
slight  decreases  over  the  previous  year’s  appropriation.  Although  the  budget  for 
1972  included  a  special  request  for  $100  million  for  the  war  on  cancer,  the  total 
request  for  NIH  research  programs  was  the  same  as  appropriated  the  previous  year; 
the  budgets  requested  for  FY  1973  and  FY  1974  show  increases  for  the  National 
Heart  and  Lung  Institute  (NHLI)  and  the  National  Cancer  Institute  (NCI),  but  the 
same  or  less  for  the  rest  of  NIH. 

But  leaders  of  the  appropriations  subcommittees  of  Congress — Flood  in  the 
House  and  Magnuson  in  the  Senate — have  begun  to  reassert  the  historical  Congres¬ 
sional  role  of  increasing  the  NIH  research  budget.  They  succeeded  in  granting 
appropriations  for  NIH  institutes  and  divisions  that  exceeded  the  President’s  re¬ 
quests  by  $60  million  in  FY  1970,  by  $150  million  in  FY  1971,  and  by  $142  million 
in  FY  1972.  The  continuing  resolutions  under  which  NIH  institutes  and  divisions 
were  funded  in  FY  1973  included  $140  million  more  than  the  President  had  request¬ 
ed.  Despite  difficulties  resulting  from  the  disagreement  over  the  President’s  right  to 
withhold  money  in  the  continuing  resolutions,  there  is  every  sign  that  Congressional 
leadership  will  insist  on  significantly  increasing  1974  appropriations. 


HISTORY  OF  TARGETED  VERSUS  BASIC  RESEARCH 

The  problem  of  targeted  versus  basic  research  has  been  with  NIH  since  its 
founding.  The  naming  of  institutes  after  disease  categories  testifies  to  the  Congres- 


79 


sional  interest  in  finding  solutions  to  disease  problems.  This  categorical  organization 
was  called  by  the  Woolridge  Committee  a  "scientifically  inappropriate  organization¬ 
al  structure”  for  the  performance  of  NIH’s  real  mission,  which  requires  "NIH  to 
concentrate  most  of  its  efforts  on  basic  research,  leaving  for  directed,  developmental 
effbrt  only  those  research  discoveries  which  appear  susceptible  to  exploitation.”  This 
view  of  the  central  importance  of  basic  research  was  clearly  also  the  view  of  NIH 
director  James  Shannon,  who  shaped  much  of  NIH  policy  during  the  years  of  its 
growth.  The  more  sophisticated  members  of  Congress,  particularly  Fogarty  and  Hill, 
were  aware  of  the  importance  of  the  basic  research  enterprise.  However,  members 
of  the  research  lobby,  having  succeeded  in  raising  the  level  of  NIH  funding,  began 
during  the  1960s  to  pay  more  attention  to  programmatic  elements  and  to  push  for 
more  targeted  research.  They  occasionally  got  their  way,  as  in  the  cancer  chemoth¬ 
erapy  program. 

During  the  last  few  years,  NIH  programs  have  in  fact  shifted  somewhat  more 
toward  targeted  research  and  away  from  the  more  traditional  research  grants.  The 
"collaborative  programs”  of  NIH,  which  are  funded  through  a  contract  mechanism, 
have  grown  much  faster  than  research  grants.  The  dollars  awarded  increased  133 
percent  between  1968  and  1972,  while  dollars  awarded  through  regular  research 
project  grants  increased  only  21  percent  (Hearings,  1973).  There  has  also  been  an 
increase  in  the  number  of  grants  awarded  for  research  centers  that  are  somewhere 
between  the  collaborative  programs,  where  the  research  program  is  directed  by  NIH 
administrators,  and  the  traditional  research  grants,  which  are  purely  investigator 
originated.  These  new  center  programs  include  the  Specialized  Centers  of  Research 
of  NHLI  and  the  Allergic  Disease  Centers  of  the  National  Institute  of  Allergy  and 
Infectious  Diseases  (NIAID)  and  the  program  of  Cancer  Research  Institutes. 

One  argument  in  favor  of  more  basic  research  is  based  on  a  view  of  the  nature 
of  science  as  an  incremental  process,  in  which  knowledge  is  accumulated  by  each 
scientist  as  he  follows  leads  developed  in  his  own  investigations;  that  is,  the  clues 
that  his  current  interest  and  knowledge  have  prepared  him  to  follow  will  lead  to 
useful  breakthroughs.  According  to  this  view,  research  targeted  on  specific  problems 
for  which  the  necessary  knowledge  base  has  not  yet  been  discovered  is  inherently 
wasteful  of  funds  and  scarce  scientific  energy.  Since  scientific  outcomes  are  by 
nature  not  predictable,  research  in  one  area  often  has  unforeseen  consequences  for 
health  care  in  another  area.  Another  argument  for  extensive  basic  research  is  the 
necessity  to  renew  the  knowledge  base  for  use  in  future  years.  A  still  further  argu¬ 
ment  is  based  on  the  belief  that  only  understanding  of  the  fundamental  life  pro¬ 
cesses  will  lead  to  solutions  that  are  ultimately  satisfactory.  To  support  this  belief, 
its  adherents  point  to  the  costly  medical  technology  required  to  treat  diseases  that 
are  improperly  understood.  For  example,  the  cost  of  operating  rehabilitation  centers 
for  polio  victims  before  the  development  of  the  vaccine  is  obviously  analogous  to 
current  treatment  required  by  victims  of  kidney  disease. 

Proponents  of  targeted  research  believe  that  the  time  is  ripe  for  more  exploita¬ 
tion  of  the  base  knowledge  that  has  been  built  up  since  the  close  of  World  War  II. 
This  belief  has  been  reinforced  by  some  clear  successes  (for  example,  the  retrolental 
fibroplasia  collaborative  program).  It  is  often  accompanied  by  a  belief  that  much 
scientific  work  is  not  relevant  to  the  disease  categories. 

The  issue  of  targeted  versus  basic  research  has  often  been  behind  a  debate  over 
administrative  mechanisms.  In  1966  the  new  program  of  funding  research  centers 


80 


was  seen  by  Dr.  Shannon  (1966)  as  an  attempt  to  satisfy  society’s  demands  for 
purposeful  enterprise  and  still  retain  the  characteristics  of  a  system  that  places  a 
premium  on  the  individual  scientist’s  capabilities  and  on  what  he  perceives  as 
important.  Other  examples  include  the  1966  debate  concerning  whether  the  Nation¬ 
al  Advisory  Council  should  review  the  contracts  of  NCI,  and  the  attempt  in  1971  to 
set  up  a  separate  Cancer  Authority  outside  of  NIH.  This  attempt  was  opposed  by 
members  of  the  biomedical  research  community  (particularly  Dr.  Cooper  of  the 
AAMC.)  Representative  Rogers  of  the  Subcommittee  on  Public  Health  and  Environ¬ 
ment  led  a  successful  fight  in  the  House  to  keep  the  cancer  program  with  NIH.  The 
resulting  legislation,  The  National  Cancer  Act  of  1971,  contained  a  specific  authori¬ 
zation  for  cancer  research  and  may  signify  that  legislative  committees  will  join  the 
appropriations  committees  in  shaping  NIH  policy  in  the  future.  In  addition  to 
Representative  Rogers,  Senator  Kennedy  has  expresed  keen  interest  in  NIH  policy. 

The  targeted  research  issue  also  appeared  in  the  recent  controversy  over  peer 
review.  All  grants  awarded  by  NIH  receive  a  double  review,  first  for  "scientific 
merit’’  by  an  Initial  Review  Group  (IRG)  consisting  of  12  to  15  scientists  who  are 
experts  in  the  particular  field  of  the  grant.  This  group  votes  either  to  recommend 
approval  or  to  recommend  disapproval  of  the  grant.  For  grants  recommended  for 
approval,  the  IRG  then  assigns  a  priority  score  by  averaging  the  priority  scores 
assigned  by  individual  members  of  the  group.  The  applications  are  then  reviewed 
for  programmatic  relevance  and  other  considerations  of  national  policy  by  the  Na¬ 
tional  Advisory  Council  of  the  Institute  to  which  the  grant  is  assigned.  These  coun¬ 
cils  include  both  lay  and  scientific  people,  and  Institute  officials  are  prevented  by  law 
from  funding  grants  not  recommended  for  approval  by  the  councils.  Although  the 
Advisory  Councils  influence  decisions  about  the  funding  levels  of  programs  within 
the  Institute,  they  only  rarely  disagree  with  the  priority  scores  assigned  by  the  IRG 
to  particular  grants.  The  administrators  within  NIH  are  not  bound  by  law  to  follow 
the  priority  order  of  those  applications  recommended  for  approval  by  Advisory 
Councils,  but  in  practice  most  of  them  do  so.  Thus,  the  IRG  plays  a  very  large  role 
in  determining  which  grants  will  be  funded. 

This  peer  review  system  is  based  on  the  view  that  science  is  so  complex  that  only 
working  scientists  have  enough  detailed  knowledge  to  judge  the  scientific  merit  of 
particular  grant  applications.  It  has  been  praised  by  many,  including  the  Woolridge 
Committee,  for  guaranteeing  the  high  quality  of  NIH-supported  research.  Criticisms 
of  the  system  are  summarized  in  Office  of  Management  and  Budget,  1973,  and 
include  the  statement,  "the  process  is  largely  reactive  to  the  initiative,  interest,  and 
whims  of  individual  researchers  and,  therefore,  is  not  readily  compatible  with  tar¬ 
geted  or  directed  research.’’  Since  this  is  listed  as  a  criticism,  one  must  assume  that 
the  critic  believes  NIH  grants  should  support  targeted  or  directed  research  rather 
than  the  individual  investigator-initiated  research  supported  through  grants  and 
the  peer  review  system. 

The  0MB  document  also  calls  attention  to  a  possible  conflict  of  interest  because 
members  of  an  IRG  determine  the  allocation  of  research  funds  of  which  either  they 
or  their  institutions  receive  a  part.  Evidence  supportive  of  this  charge  include  the 
concentration  of  NIH  funds  in  a  few  select  institutions  and  the  fact  that  the  system 
"produces  a  large  number  of  approved  but  unfunded  applications  that  are  often  used 
to  support  the  need  for  more  research  funds,”  from  which  the  same  scientific  com¬ 
munity  will  benefit.  But  the  same  evidence  may  also  be  seen  as  showing  that  the  best 


81 


research  is  carried  out  in  a  few  select  institutions  and  that  NIH  funding  levels  have 
not  been  adequate  to  support  all  scientifically  meritorious  proposals.  Other  objec¬ 
tions  given  in  the  0MB  document  assert  that  the  system  does  not  assure  administra¬ 
tive  responsibility  and  that  the  scientific  merit  judgments  are  not  subject  to  objective 
assessments. 


NIH  AND  MEDICAL  SCHOOLS 

The  rapid  growth  of  NIH  funding  described  earlier  in  this  section  had  a  profound 
effect  on  medical  schools  in  the  United  States.  Between  1951  and  1967,  the  number 
of  all  full-time  medical  faculty  increased  from  3,500  to  19,296.  Although  improve¬ 
ment  of  medical  education  was  not  one  of  the  explicit  aims  of  the  Congressional 
leadership,  and  all  NIH  programs  were  clearly  for  the  support  of  either  research  or 
research  training,  the  Jones  Committee  reported  that  the  program  was  "largely 
responsible  for  an  improvement  in  standards  of  medical  training.”  The  Woolridge 
Committee  (The  White  House,  1965),  while  endorsing  the  view  that  the  quality  of 
instruction  had  been  enhanced,  also  noted  the  extensive  dependence  of  educational 
institutions  on  biomedical  research  funding  and  warned  that  withdrawal  or  substan¬ 
tial  curtailment  of  funding  would  be  "disastrous.” 

Since  1967  a  new  major  force  has  been  emerging  in  the  political  arena:  the 
biomedical  research  community.  The  end  of  the  era  of  sharply  increasing  funds  has 
given  rise  to  much  discussion  of  the  role  of  biomedical  science  in  the  nation’s  welfare 
and  in  government  policy,  as  many  articles  in  Science  testify.'*  The  research  com¬ 
munity  has  also  begun  to  function  as  an  effective  lobby  under  the  leadership  of  Dr. 
John  Cooper  of  the  Association  of  American  Medical  Colleges.  The  AAMC  survey 
of  the  effect  of  President  Nixon’s  1974  budget  on  medical  schools  was  presented 
promptly  to  the  House  and  Senate  appropriation  subcommittees. 

Medical  schools  objected  to  the  decline  in  research  support,  the  shift  to  targeted 
rather  than  basic  research,  and  the  cutbacks  in  Training  Grants  and  General  Re¬ 
search  Support  Grants  (GRSGs).  The  decline  in  research  project  grants  and  GRSGs 
are  alleged  to  be  particularly  hard  on  the  young  faculty  members.  Some  medical 
school  faculty  members  say  that  a  young  principal  investigator  is  likely  to  receive 
a  poorer  priority  score  than  a  more  senior  person.  Because  many  schools  have  no 
funds  to  support  research,  and  because  research  performance  is  usually  necessary 
for  promotion  and  tenure,  some  fear  that  medical  school  faculties  will  become  older 
than  is  desirable. 


POLICY  RESEARCH  RELEVANT  TO  NIH 
Funding  Levels 

The  process  that  results  in  NIH  funding  levels  is  a  clear  example  of  Lindblom’s 
"disjointed  incrementalism.”  The  answer  to  the  question.  How  much  is  needed  for 
biomedical  research  more  than  is  needed  for  other  purposes?  is  and  must  be  derived 
from  the  political  process  since  people  have  different  values  and  goals.  This  political 


^  For  example,  Dubridge,  1969;  York,  1971. 


82 


process  with  its  multitude  of  players  and  no  single  decisionmaker  results  in  deci¬ 
sions  that  vary  only  incrementally  from  previous  decisions.  The  role  policy  analysis 
can  play  in  this  process  is  to  illuminate  the  consequences  of  alternative  choices.  The 
question  for  analysis  is:  What  will  be  the  output  from  an  increment  of  dollars  spent 
on  biomedical  research? 

The  output  of  research  that  has  improved  health  care  can  be  measured  in  terms 
of  (1)  number  of  deaths  prevented,  (2)  number  of  days  of  disability  prevented  (which 
itself  is  a  surrogate  for  pain  and  suffering),  (3)  the  market  value  of  the  production 
that  was  noHost  due  to  death  or  disability,  and  (4)  the  reduced  cost  of  treatment. 
In  his  case  study  of  poliomyelitis,  Weisbrod  (1971)  chose  to  use  only  benefits  (3)  and 
(4),  which  can  be  measured  in  dollar  terms.  The  costs  of  the  research  included  all 
amounts  awarded  for  polio  research  (determined  from  data  from  the  Scientific  Infor¬ 
mation  Exchange).  He  calculated  the  rate  of  return  of  the  investment  in  research 
under  various  assumptions  concerning  the  completeness  of  the  cost  data  and  the 
costs  of  vaccination. 

Although  this  type  of  research  activity  is  quite  valuable  and  deserves  repetition 
for  other  cases,  it  is  extremely  difficult  to  estimate  the  cost  of  the  research  activity 
that  produced  the  given  improvement  in  health  care.  Because  basic  research  is  a 
cumulative  enterprise,  it  is  often  difficult  to  determine  the  series  of  research  activi¬ 
ties  that  actually  led  to  the  health  care  improvement.  Some  activities  may  also  have 
led  down  other  paths  to  other  improvements  in  health  care.  Even  if  one  co«//(7  trace 
through  the  scientific  maze  to  determine  after  the  fact  exactly  which  activities  were 
required,  the  unpredictable  nature  of  scientific  outcomes  means  that  research  costs 
must  include  other  research  activities  that  had  to  be  supported  in  order  to  find  the 
path  that  led  to  success.  Therefore,  until  we  find  better  methods  for  estimating  the 
cost  of  the  research  activity  required  for  a  given  improvement  in  health  care,  other 
similar  case  studies  will  deal,  as  Weisbrod’s  did,  with  aggregate  costs  and  thus  focus 
on  the  return  on  the  average  dollar  invested,  rather  than  on  the  marginal  dollar 
invested.  What  would  the  outcome  have  been  if  investment  in  polio  research  had 
been  at  a  different  level?  Such  studies,  though  exceptionally  valuable,  cannot  ad¬ 
dress  this  question. 

Another  problem  in  attempting  to  use  health  care  as  a  measure  of  research 
output  is  the  time  frame  involved.  Many  of  the  health  care  effects  of  basic  research 
are  not  seen  or  even  guessed  at  for  many  years. 

All  these  considerations  have  led  us  to  examine  other  criteria  that  may  help 
clarify  the  question  of  what  the  level  of  NIH  funding  should  be.  Berliner  and  Kenne¬ 
dy  (1970)  offer  a  short  discussion  of  many  criteria  that  could  be  used  to  determine 
the  level  of  biomedical  research  funding.  They  discuss  cost-benefit  studies,  as  well 
as  criteria  based  on  a  fixed  percentage  of  GNP,  or  of  health  care  expenditures. 
Although  such  rules  have  the  advantage  of  simple  implementation,  there  is  no  clear 
rationale  for  selecting  a  particular  percentage  figure.  They  conclude  with  the  obser¬ 
vation  that  started  this  section:  The  final  judgment  must  be  political. 

A  Committee  of  the  AAMC  recommended  "that  the  national  policy  for  biomedi¬ 
cal  research  assure  support  at  levels  sufficient  to  engage  all  well  qualified  brainpow- 
er."*  In  the  long  run,  this  cannot  be  used  as  a  guide  because  the  number  of  qualified 
scientists  may  be  influenced  by  national  policy.  In  the  short  run,  it  is  clearly  an 


■'  "A  Policy  for  Biomedical  Research,”  1971,  p.  732. 


83 


appropriate  upper  bound  on  the  level  of  support.  To  fund  at  that  level  requires  the 
faith  held  by  many,  including  the  Woolridge  Commission,  that  if  the  NIH  research 
program  "is  steadily  effective  in  increasing  the  quality  and  quantity  of  health  relat¬ 
ed  scientific  research,  it  should  ultimately  prove  to  be  money  well  spent.® 

Although  society  wants  the  output  of  research  to  be  improvement  of  health  care, 
much  research  is  used  directly  only  as  input  to  further  research.  This  suggests  that 
one  might  attempt  to  measure  research  output  by  its  usefulness  to  other  research¬ 
ers.  Such  measurements  could  help  determine  the  extent  to  which  NIH  programs 
are  effective  in  increasing  the  quality  and  quantity  of  biomedical  research. 

Targeted  Versus  Basic  Research  and  Administrative  Mechanisms 

I  first  consider  approaches  to  the  fundamental  question  of  the  relationship 
between  basic  research  and  applications.  In  the  TRACES  study,®  the  researchers 
chose  five  technological  innovations  and  traced  backward  through  history  to  identify 
the  key  events  that  caused  the  ultimate  innovation.  Of  all  events  discovered,  they 
classified  70  percent  as  nonmission  research,  20  percent  as  mission-oriented  re¬ 
search,  and  10  percent  as  development.  Further  case  studies  of  this  sort  could 
illuminate  the  relationship  between  targeted  and  basic  research,  particularly  dur¬ 
ing  the  crucial  period  when  scientific  knowledge  is  transferred  to  technical  appli¬ 
cation.  However,  such  studies  cannot  directly  address  the  usefulness  of  basic  re¬ 
search  relative  to  directed  research  because,  having  begun  with  an  application  and 
worked  backward,  they  will  not  discover  how  much  (if  any)  of  the  basic  research  was 
not  and  is  not  applicable  to  real  world  problems. 

A  case  study  design  that  would  be  more  relevant  might  select  a  group  of  basic 
research  projects  from  some  time  in  the  past  and  attempt  to  determine  the  influence 
these  projects  exerted  on  later  research  and  on  technology.  Garfield  (1972)  suggests 
a  tool  that  might  be  used  to  discover  relationships  between  scientific  events.  His 
"historiograph”  is  a  network  diagram  in  which  the  nodes  are  scientific  events  repre¬ 
sented  by  pieces  of  scientific  literature,  and  the  paths  between  nodes  are  citations 
from  one  article  to  previous  articles.  Production  of  such  diagrams  by  a  computer 
might  significantly  reduce  the  amount  of  manpower  required  to  discover  which 
applications  were  influenced  by  selected  pieces  of  basic  research. 

Identification  of  factors  that  influence  the  quality  research  is  a  research  area 
directly  related  to  the  choice  of  administrative  mechanisms.  Although  a  small 
amount  of  empirical  work  has  been  reported  in  the  sociological  literature,^  no  firm 
conclusions  can  be  drawn  from  it. 

The  effect  of  age  on  research  quality  is  of  particular  interest  in  view  of  the  young 
principal  investigator  problem  mentioned  earlier.  In  some  fields  of  science,  particu¬ 
larly  physics  and  mathematics,  youth  is  acknowledged  to  be  a  definite  asset.  Kuhn 
(1970)  asserts  that  only  the  very  young  or  people  new  to  a  scientific  field  are  capable 
of  discoveries  that  result  in  scientific  revolutions.  This  corollary  follows  directly 
from  his  view  of  the  nature  of  scientific  progress,  since  older  scientists  are  too 
steeped  in  their  "paradigm”  to  see  the  theory  that  can  remove  the  anomalies  of  the 
field. 

^  The  White  House,  1965. 

®  Illinois  Institute  of  Technology,  1968. 

’  For  example,  see  Gordon  and  Marquis,  1966;  Gordon  and  Morse,  1968. 


84 


A  study  by  Douglass  and  James  (1973)  found  that  each  year  10  percent  of  all  the 
principal  investigators  being  supported  through  project  grants  were  being  supported 
for  the  first  time.  It  also  examined  the  age,  sex,  and  degrees  held  hy  these  new 
principal  investigators.  However,  to  our  knowledge  there  has  been  no  study  of  the 
effect  of  age  on  biomedical  research  output.  This  is  one  factor  to  consider  in  further 
work.  Data  from  the  Office  of  Financial  Management  may  also  permit  study  of  the 
effects  of  other  imputs  to  research,  such  as  technicians,  graduate  students,  and 
equipment  expenditures. 

The  organization  of  the  research  effort  is  also  of  interest.  Economic  theory 
suggests  that  economies  of  scale  may  be  found  in  biomedical  research  as  well  as  in 
other  areas.  If  this  is  true,  then  the  program  project  and  center  grants  should  result 
in  more  efficient  research  because  of 

•  Improved  communication  among  researchers  working  on  the  same  project, 
particularly  among  those  of  different  disciplines. 

•  Specialization  (e.g.,  the  person  with  the  most  ideas  is  the  project  leader;  he 
directs  those  of  somewhat  lesser  ability). 

•  More  efficient  use  of  resources,  such  as  technicians  and  equipment. 

The  argument  is  not  completely  convincing  when  applied  to  research  in  medical 
schools  that  receive  most  of  these  grants.  For  example,  any  member  of  a  medical 
school  faculty  who  can  get  a  research  grant  from  NIH  may  possibly  already  be  so 
effective  that  there  is  no  payoff  to  specialization.  It  may  also  he  true  that  communica¬ 
tion  among  research  workers  of  different  disciplines  is  not  perceptibly  enhanced  by 
program  project  and  center  grants  because  it  is  controlled  by  other  factors  such  as 
the  physical  location  of  laboratories,  the  structure  of  the  curriculum,  or  other  poli¬ 
cies  of  the  management  of  the  medical  school. 


Appendix  D 

ON  CITATIONS  FOR  NEGATIVE  REASONS 


The  use  of  citations  as  a  measure  of  the  usefulness  of  a  piece  of  research  to  the 
scientific  community  is  based  on  the  assumption  that  citing  authors  choose  to  refer¬ 
ence  works  that  have  influenced  their  research  in  a  productive  manner.  However, 
occasionally  an  author  references  a  work  solely  to  criticize  the  cited  work  in  some 
manner — e.g.,  by  pointing  out  an  error  in  methodology.  Do  citations  for  negative 
reasons  occur  frequently  enough  to  distort  the  relationship  between  frequency 
counts  of  citations  and  research  output? 

A  sample  of  45  citations  was  selected  from  the  set  of  citations  furnished  by  the 
Institute  for  Scientific  Information.  Because  it  seems  unlikely  that  an  author  will 
cite  his  own  work  to  discuss  errors  in  it,  all  cases  where  an  author  was  listed  among 
both  the  citing  and  cited  authors  were  omitted  from  the  sample.'  Publications  that 
were  not  in  the  English  language  were  also  omitted. 

I  read  each  one  of  the  sample  of  citing  articles.  All  but  one  of  the  45  citations 
were  not  made  to  discuss  any  problems  in  the  cited  work.  Out  of  these  44  citations, 
there  were  two  instances  where  the  citing  article  contained  new  data  that  were 
inconsistent  with  a  theory  suggested  in  the  cited  article.  However,  since  it  is  com¬ 
monly  accepted  that  the  validation  and  invalidation  of  theory  by  experiment  is  a 
major  part  of  the  paradigm  of  science,  these  were  not  classified  as  citations  for 
negative  reasons. 

One  of  these  45  citations  was  made  to  discuss  problems  in  the  cited  work.  In  this 
case  the  citing  article  was  devoted  entirely  to  discussing  the  cited  article.  The  citing 
author  was  unable  to  reproduce  the  results  of  the  cited  author  and  came  to  the 
conclusion  that  the  control  of  the  original  experiment  had  been  faulty.  He  corre¬ 
sponded  with  the  cited  authors  and,  as  reported  in  the  citing  article,  the  cited 
authors  agreed  that  there  had  been  problems  with  their  experiment  and  that  their 
conclusions  had  been  incorrect.  The  cited  article  was  published  in  October  of  1968 
and  the  criticism  was  published  in  the  same  journal  within  a  year.  In  the  six" years 
that  have  elapsed  since  its  publication,  no  other  citations  of  the  criticized  work  have 
appeared  in  the  Science  Citation  Index. 

In  addition  to  the  randomly  selected  articles,  I  also  examined  citations  of  three 
highly  cited  articles.  The  articles  were  published  by  the  only  two  grants  that  had 
published  articles  among  the  most-cited  1  percent  and  also  received  below  average 
priority  scores  on  renewal  applications  prepared  after  the  highly  cited  articles  were 
published.  It  seemed  possible  that  the  high  rate  of  citations  might  be  due  to  citations 
for  negative  reasons.  The  first  of  these  grants  had  published  three  articles  that  were 
within  the  top  1  percent  of  the  most-cited  articles.  Only  one  other  grant  had  pub¬ 
lished  at  least  this  number  of  articles  within  the  most-cited  1  percent.  Out  of  these 
three  articles,  the  two  most  cited  were  chosen.  Each  of  these  articles  contained  a 
description  of  a  new  technique  and  certain  research  results  obtained  with  it.  One 

'  In  addition  to  the  articles  discussed  here,  five  self-citations  were  also  read  and  none  were  found  to 
be  for  negative  reasons. 


85 


86 


of  the  articles  contained  a  review  of  much  previously  published  research  performed 
by  the  authors  before  the  start  of  the  fiscal  year  1967  grant  period. 

Four  citations  to  each  of  these  articles  were  then  selected,  omitting  articles 
published  by  any  of  the  coauthors.  Of  these  eight  articles,  five  used  the  techniques 
developed  by  the  cited  authors  as  a  major  tool  in  their  own  work  and  three  made 
reference  to  the  research  results  of  the  cited  article  as  either  a  precursor  or  confi¬ 
rmation  of  the  results  of  the  citing  articles.  In  no  case  was  there  a  reference  that 
could  be  classified  as  negative. 

From  this  small  sample  of  citations  we  can  hypothesize  that  the  majority  of  the 
citations  to  these  two  articles  were  by  researchers  who  used  the  methodology  of  the 
cited  paper  rather  than  the  research  results.  One  could  argue  that  the  contribution 
of  a  new  technique  is  not  as  valuable  a  contribution  to  the  progress  of  science  as  the 
elucidation  or  verification  of  a  theory.  However,  new  methodology  is  clearly  impor¬ 
tant,  and  the  authors  who  chose  to  reference  these  articles  did  in  fact  find  this 
methodology  useful  in  the  work. 

The  second  of  these  two  grants  resulted  in  only  one  article  in  the  most-cited  1 
percent  of  the  sample.  This  article  is  concerned  with  a  phenomenon  frequently 
associated  with  severe  virus  infection.  Four  citations  to  this  article  were  chosen. 
Three  of  these  citations  were  from  clinical  case  studies  and  the  other  was  a  report 
of  research  on  animals.  Each  article  was  concerned  with  a  separate  disease  problem; 
in  one  of  the  clinical  studies  the  data  from  the  cited  article  was  used  as  an  important 
part  of  an  argument  for  a  probable  pathogenesis  of  the  syndrome  under  study.  In 
another  the  cited  article  was  offered  as  part  of  the  refutation  of  a  diagnosis  alterna¬ 
tive  to  the  one  chosen  by  the  authors.  (The  chosen  diagnosis  was  the  first  of  its  kind 
to  be  reported  in  the  literature.)  In  the  other  two  articles  the  cited  article  was 
mentioned  in  passing  as  part  of  a  larger  body  of  information  relevant  to  the  current 
work.  I  found  no  instance  of  negative  citations. 

I  concluded  from  these  readings  that  citations  of  articles  for  negative  reasons  are 
extremely  rare  and  unlikely  to  distort  the  use  of  frequency  counts  of  citations  as 
measures  of  research  output. 


Appendix  E 

SUMMARY  OF  LINEAR  RELATIONSHIPS 


These  two  tables  summarize  all  the  linear  relationships  that  are  estimated  in 
Sections  II  and  V  of  this  report. 


Table  E-1 

SUMMARY  OF  LINEAR  RELATIONSHIPS,  SECTION  II 


Table 

Dependent 

Variable 

Method  of  Fit 

Quality  of 
Fit 

Independent  Variables 

Variable 

"l 

"2 

^3 

1 

"r 

Maximum  Likeli¬ 
hood  Logit 

=  202.4 
(3  df) 

Coefficient 

Significance 

0.012 

0.001 

-0.105 

NS 

-0.150 

0.001 

Variable 

^1 

"2 

3 

'r 

Regression 

R^  =  0.14 

Coef  f icient 
Significance 

0.546 

0.001 

-7.10 

0.01 

0.845 

NS 

Variable 

Sr 

^2 

4 

^1 

Regression 

R^  =  0.15 

Coefficient 

Significance 

0.266 

0.001 

8.676 

0.001 

Variable 

f 

S 

^72 

"73 

5 

^N 

Regression 

R^  =  0.13 

Coefficient 

Significance 

35.4 

0.001 

0.371 

0.001 

4.68 

NS 

10.26 

0.01 

Variable 

f 

S 

"72 

"73 

5 

"n 

Maximum  Likeli¬ 
hood  Logit 

=  165.2 
(3  df) 

Coefficient 

Significance 

1.46 

0.001 

0.0028 

0.001 

-0.085 

NS 

-0.296 

0.01 

Variable 

^1 

S 

f 

"2 

6 

Sr 

Regression 

R^  =  0.25 

Coefficient 

Significance 

0.469 

0.001 

0.193 

0.001 

10.25 

NS 

-0.19 

0.05 

Definition  of  Variables 

P_  =  Probability  that  an  application  for  renewal  will  be  recommended  for  disapproval. 
K 

=  Priority  score  received  on  the  preceding  application  when  the  grant  was  funded. 


^  _  1  if  previi,';s  application  was  itself  a  renewal. 

2  0  if  previous  application  was  for  a  new  grant. 

=  Year  of  application  (1968  X^  =  0  etc.). 

S  =  Priority  score  received  on  a  renewal  application. 

K 

S^  =  Priority  score  received  on  a  new  application. 

f  =  Fraction  of  all  earlier  applications  from  the  same  investigator  that  have  been 
disapproved. 

S  -  Average  score  received  on  all  earlier  applications  from  the  same  Individual. 

2  ^  1  if  application  was  for  FY  1972. 

72  0  otherwise. 

2  ^  1  if  application  was  for  FY  1973. 

73  0  otherwise. 

Pj^  =  Probability  that  a  new  application  will  be  recommended  for  disapproval. 


87 


88 


Table  E-2 

SUMMARY  OF  LINEAR  RELATIONSHIPS,  SECTION  V 


Dependent  Data  From  Quality 
Table  Variable  Group  of  Fit 


Independent  Variables 


27 


28 


29 


31 


32 


Variables 

'’j 

s 

P 

t 

c 

t 

"l 

=^2 

Basic  Science 

= 

0.22 

Coefficient 

-0.277 

-1.892 

-3.01 

-1. 17 

0.33 

-3.74 

5.49 

-0.66 

Significance 

NS 

0.001 

NS 

NS 

NS 

NS 

NS 

NS 

s 

Medical 

= 

0.17 

Coefficient 

-0.461 

-2.100 

7.91 

1.23 

0.42 

3.10 

3.23 

-0.  78 

Significance 

NS 

0.001 

NS 

NS 

NS 

NS 

NS 

0.05 

Other 

0.03 

Coefficient 

0.268 

-0.514 

-6.69 

-1.53 

2.02 

5.33 

14.31 

-0.18 

Significance 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

Variables 

P 

a 

C 

a 

"i 

"2 

Basic  Science 

R^ 

= 

0.21 

Coefficient 

-0.483 

-1.753 

2.19 

-0.88 

Significance 

NS 

0.001 

NS 

NS 

s 

Medical 

0.16 

Coefficient 

0.872 

-1.954 

6.35 

-0.69 

Significance 

NS 

NS 

NS 

0.05 

Other 

R^ 

= 

O.OI 

Coefficient 

-0.39 

-0.021 

2.89 

-0.22 

Significance 

NS 

NS 

NS 

NS 

Variables 

P 

a 

C 

a 

^i 

>^2 

Basic  Science 

R^ 

= 

0.26 

Coefficient 

-0.89 

-1.59 

5.37 

-0.53 

0.  393 

Significance 

NS 

0.001 

NS 

NS 

0.01 

s 

Medical 

R^ 

= 

0.24 

Coefficient 

0.76 

-1.50 

I. II 

-0.53 

0.406 

r2 

Significance 

NS 

0.001 

NS 

NS 

0.001 

Other 

= 

0.12 

Coefficient 

-0.75 

-0.569 

2.58 

-0.05 

0.580 

Significance 

NS 

NS 

NS 

NS 

0.01 

Variables 

^I 

>^4 

logCXj) 

>^6 

Basic  Science 

R^ 

= 

0.24 

Coefficient 

0.46 

0.21 

0.36 

0.21 

p 

and  Medical 

Significance 

0.001 

0.01 

0.001 

0.01 

R^ 

= 

0.013 

Coefficient 

0.42 

0.23 

0.38 

0.31 

Significance 

0.001 

NS 

0.001 

0.01 

Variables 

"l 

logCX^) 

>^4 

log(Xj) 

^7 

p 

r2 

= 

0.25 

Coefficient 

0.38 

-O.II 

-0.000 

0.05 

0.51 

0.19 

Significance 

0.001 

NS 

NS 

NS 

0.01 

0.001 

F 

Basic  Science 

R^ 

= 

0.22 

Coefficient 

0.45 

0.33 

-0.000 

— 

— 

0.22 

and  Medical 

Significance 

0.001 

0.001 

NS 

— 

— 

0.001 

C 

R^ 

= 

0.14 

Coefficient 

0.35 

-0.21 

-0.003 

-0.02 

0.60 

0.15 

R^ 

Significance 

0.01 

NS 

0.05 

NS 

0.05 

0.05 

C 

0.12 

Coefficient 

0.43 

0.31 

-0.003 

— 

— 

0.17 

Significance 

0.001 

0.01 

0.05 

— 

— 

0.05 

NOTE:  NS  =  Not  significant. 

Definition  of  Variables 

5  =  Priority  score  received  on  first  competing  renewal  application  following  FY  1967. 

=  Number  of  publications  of  Type  i  where 
i  =  j  for  journal  articles 

i  =  b  for  books 

i  =  t  for  talks,  theses,  and  abstracts 

1  =  a  for  all  publications  cited  at  least  twice. 

=  Average  citation  rate  of  publications  of  Type  i. 

_  1  if  FY  1967  grant  was  a  renewal  grant. 

^1  0  if  FY  1967  grant  was  a  new  grant. 

X2  =  Dollars  awarded  in  FY  1967. 

=  Priority  score  received  on  FY  1967  application. 

X^  =  Years  of  support  received  between  FY  1967  and  FY  1970. 

X^  =  Total  dollars  awarded  between  FY  1967  and  FY  1970. 

_  1  if  grantee  was  among  the  14  schools  that  received  the  largest  share  of  NIH  grant  awards  in  FY  1967. 

6  0  otherwise. 

X^  **  Number  of  years  of  support  committed  in  FY  1967. 


REFERENCES 


"A  Policy  for  Biomedical  Research,”  Report  of  AAMC,  Journal  of  Medical  Educa¬ 
tion,  Vol.  46,  No.  8,  August,  1971,  pp.  691-743. 

Bayer,  Alan  E.,  and  John  Folger,  "Some  Correlates  of  a  Citation  Measure  of  Produc¬ 
tivity  in  Science,”  Sociology  of  Education,  Vol.  39,  No.  4,  1966,  pp.  382-390. 

Berliner,  Robert  W.,  and  Thomas  J.  Kennedy,  "National  Expenditures  for  Biomedi¬ 
cal  Research,”  Journal  of  Medical  Education,  Vol.  45,  No.  9,  September  1970, 
pp.  666-678. 

Carter,  G.  M.,  D.  S.  C.  Chu,  J.  E.  Koehler,  R.  L.  Slighton,  and  A.  P.  Williams,  Jr., 
Federal  Manpower  Legislation  and  the  Academic  Health  Centers:  An  Interim 
Report,  The  Rand  Corporation,  Santa  Monica,  R-1464-HEW,  April  1974. 

Division  of  Research  Grants,  Information  and  Instruction  Handbook,  No.  VI-A, 
Definitions  and  Specifications — Pending  and  Open  Master  Files  of  IMP  AC  Sys¬ 
tem,  National  Institutes  of  Health,  Bethesda,  Md.,  July  1971. 

- ,  Research  Grants  Index,  published  annually  by  National  Institutes  of  Health, 

Bethesda,  Md. 

Douglass,  Carl  D.,  and  John  C.  James,  "Support  of  New  Principal  Investigators  by 
NIH:  1966  to  1972,”  Science,  Vol.  181,  July  1973,  pp.  241-244. 

Dubridge,  Lee  A.,  "Science  Serves  Society,”  Science,  Vol.  164,  June  1969,  pp.  1137- 
1140. 

Edwards,  Charles  C.,  Speech  made  on  Feb.  21, 1974,  before  intramural  scientists  at 
NIH  Clinical  Center,  Reprinted  in  Drug  Research  Reports  (The  Blue  Sheet), 
Vol.  17,  No.  10,  March  6,  1974,  pp.  S2-S5. 

Garfield,  E.,  "Citation  Indexing  for  Studying  Science,”  Nature,  Vol.  227,  1970,  pp. 
669-671. 

- ,  "Historiographs,  Librarianship  and  the  History  of  Science,”  reprinted  in  Insti¬ 
tute  for  Scientific  Information,  1972. 

Gordon,  Gerald,  and  Sue  Marquis,  "Freedom,  Visibility  of  Consequences  and  Innova¬ 
tion,”  American  Journal  of  Sociology,  Vol.  72,  No.  2,  September,  1966,  pp. 
195-202. 

- ,  and  Edward  V.  Morse,  "Creative  Potential  and  Organizational  Structure,” 

Proceedings  of  Academy  of  Management,  1968,  pp.  0)1-12. 

Hagstrom,  Warron  0.,  "Inputs,  Outputs  and  the  Prestige  of  University  Science 
Departments,”  Sociology  of  Education,  Vol.  44,  Fall  1971,  pp.  375-397. 

Hearings,  Department  of  Labor  and  HEW  appropriations  for  1972,  92d  Cong.  1st 
Sess.,  Part  3,  National  Institutes  of  Health. 

Hearings,  Department  of  Labor  and  HEW  appropriations  for  1973,  92d  Cong.  2d 
Sess.  Part  4,  National  Institutes  of  Health,  Part  7,  Testimony  of  Members  of 
Congress  and  Interested  Individuals  and  Organizations. 

Illinois  Institute  of  Technology,  Technology  in  Retrospect  and  Critical  Events  in 
Science,  December  1968,  a  report  prepared  for  the  National  Science  Founda¬ 
tion  under  Contract  NSF-C535. 

Inhaber,  Herbert,  "Is  There  a  Pecking  Order  in  Physics  Journals?”  Physics  Today, 
May  1974,  pp.  39-43. 


89 


90 


Institute  for  Scientific  Information,  Science  Citation  Index,  1972  Guide  and  Journal 
List,  Philadelphia,  1972. 

Kuhn,  Thomas  S.,  "The  Structure  of  Scientific  Revolution,”  International  Ency¬ 
clopedia  of  Unified  Science,  Vol.  2,  No.  2,  University  of  Chicago  Press,  1970. 

Margolis,  J.,  "Citation  Indexing  and  Evaluation  of  Scientific  Papers,”  Science,  Vol. 
155,  March  1967,  pp.  1213-1219. 

Martino,  Joseph  P.,  "Citation  Indexing  for  Research  and  Development  Manage¬ 
ment,”  IEEE  Transactions  Engineering  Management,  Vol.  EM18,  1971,  pp. 
146-151. 

"Medical  Education  in  The  United  States,”  JAMA,  Vol.  22,  No.  8,  November  20, 
1972,  pp.  961-1048. 

Morris,  Carl,  and  John  Rolph,  Introduction  to  Statistics  and  Data  Analysis  with 
Computer  Applications  II,  P-4696,  The  Rand  Corporation,  Santa  Monica,  Sep¬ 
tember  1971. 

Narin,  Francis,  M.  Carpenter,  and  N.  Berlt,  "Interrelationships  of  Scientific  Jour¬ 
nals,”  Journal  of  the  American  Society  for  Information  Science,  September- 
October  1972,  Vol.  23,  No.  5,  pp.  323-331. 

Office  of  Management  and  Budget,  "NIH  and  NIMH  Peer  Review  System,”  Drug 
Research  Reports,  Vol.  16,  No.  22,  May  30,  1973,  pp.  4-9. 

Orr,  R.  H.,  and  J.  L.  Kassab,  "Peer  Group  Judgments  on  Scientific  Merit:  Editorial 
Refereeing,”  presented  at  the  Cong.  Int.  Fed.  Documentation,  Washington, 
D.C. 

Saunders,  J.  Palmer,  and  Mordecai  H.  Gordon,  "NIH  Study  Section  Ratings:  Scien¬ 
tific  Merit  or  Order  of  Payment,”  National  Cancer  Institute,  1965  (mimeo.). 

Shannon,  James  A.,  "Biomedical  Sciences-Present  Status  and  Problems,”  in  Science, 
Government  and  the  Universities,  University  of  Washington  Press,  Seattle, 
1966. 

Strickland,  Stephen  P.,  Politics,  Science  and  Dread  Disease,  Harvard  University 
Press,  Cambridge,  Massachusetts,  1972. 

The  White  House,  Biomedical  Science  and  Its  Administration,  Washington,  D.C., 
1965  (The  Woolridge  Committee). 

Weisbrod,  Burton  A.,  "Costs  and  Benefits  of  Medical  Research:  A  Case  Study  of 
Poliomyelitis,”  Journal  of  Political  Economy,  Vol.  79,  No.  3,  May-June  1971, 
pp.  527-544. 

Williams,  A.  P.,  Einding  Representative  Academic  Health  Centers,T\\e  Rand  Corpor¬ 
ation  (forthcoming). 

York,  Carl  M.,  "Steps  Toward  a  National  Policy  for  Academic  Science,”  Science,  Vol. 
172,  May  14,  1971,  pp.  643-648. 


V 


fy4 


•  I* . 

S;-':'-^'-  ..' i'i 


,''V  V  „-,r.o. 


4 


'  Mtii 


